Chapter 4. Primitive Obsession

The obsession with primitives is a primitive obsession.

Rich Hickey

4.0 Introduction

Many software engineers think that software is about “moving data around”; object-oriented schools and textbooks focus on the data and attributes when teaching about modeling the real world. This was a cultural bias taught in universities during the ’80s and ’90s. Industry trends pushed engineers to create entity-relationship diagrams (ERDs) and reason about the business data instead of focusing on the behavior.

Data is more relevant than ever. Data science is growing and the world revolves around data. You need to create a simulator to manage and protect data and expose behavior while hiding information and accidental representation to avoid coupling. The recipes from this chapter will help you identify small objects and hide accidental representation. You will discover many cohesive small objects and reuse them in many different contexts.

Cohesion

Cohesion is a measure of the degree to which the elements within a single software class or module work together to achieve a single, well-defined purpose. It refers to how closely related the objects are to each other and to the overall goal of the module. You can see high cohesion as a desirable property in software design since the elements within a module are closely related and work together effectively to achieve a specific goal.

4.1 Creating Small Objects

Problem

You have big objects containing only primitive types as fields.

Solution

Find responsibilities for small objects in the MAPPER and reify them.

Discussion

Since the early days of computing, engineers map all they see to the familiar primitive data types such as String, Integer, and Collection. Mapping to those data types sometimes violates abstraction and fail fast principles. The Person’s name has different behaviors than a string as you can see in the following example:

public class Person {
    private final String name;

    public Person(String name) {
        this.name = name;
    }
}

The concept of names is reified:

public class Name {
    private final String name;

    public Name(String name) {
        this.name = name;
        // Name has its own creation rules, comparison, etc.
        // Might be different than a string
    }
}

public class Person {
    private final Name name;

    public Person(Name name) {
        // Name is created as a valid one,
        // you don't need to add validations here
        this.name = name;
    }
}

Take the five-letter word from the Wordle game as an example. A Wordle word does not have the same responsibilities as a char(5) and does not map on the bijection. If you want to create a Wordle game, you will see a bijection between a Wordle word different from a String or char(5), since they don’t have the same responsibilities. For example, it is not a String’s responsibility to find how many matches it has to the secret Wordle word. And it is not the responsibility of a Wordle word to concatenate.

Wordle

Wordle is a popular online word-guessing game where you have six attempts to guess a five-letter word selected by the game. You make each guess by entering a five-letter word, and the game indicates which letters are correct and in the correct position (marked with a green square) and which letters are correct but in the wrong position (marked with a yellow square).

In a very small number of mission-critical systems, there is a trade-off between abstraction and performance. But to avoid premature optimization (see Chapter 16, “Premature Optimization”), you should rely on modern computers and virtual machine optimizations and, as always, you need to stick to evidence in real-world scenarios. Finding small objects is a very hard task, requiring experience to do a good job and avoid overdesign. There’s no silver bullet in choosing how and when to map something.

No Silver Bullet

The “no silver bullet” concept is a phrase coined by computer scientist and software engineering pioneer Fred Brooks in his 1986 essay “No Silver Bullet: Essence and Accidents of Software Engineering”. Brooks argues that there is no single solution or approach that can solve all of the problems or significantly improve the productivity and effectiveness of software development.

4.2 Reifying Primitive Data

Problem

You have objects using too many primitive types.

Solution

Use small objects instead of primitive ones.

Discussion

Suppose you’re building a web server:

  int port = 8080;
  InetSocketAddress in = open("example.org", port);
  String uri = urifromPort("example.org", port);
  String address = addressFromPort("example.org", port);
  String path = pathFromPort("example.org", port);

This naive example has many problems. It violates the “Tell, don’t ask” principle (see Recipe 3.3, “Removing Setters from Objects”) and the fail fast principle. Moreover, it does not follow the MAPPER design rule and violates the subset principle. There is code manipulation duplicated everywhere that is needed to use these objects since it does not clearly separate the “what” from the “how.”

The industry is very lazy when it comes to creating small objects and also separating the what and the how since it takes some extra effort to discover such abstractions. It’s important to look at the protocol and behavior of small components and forget trying to understand the internals of how things work. A bijection-compliant solution might be:

Port server = Port.parse(this, "www.example.org:8080");
// Port is a small object with responsibilities and protocol

Port in = server.open(this); // returns a port, not a number
URI uri = server.asUri(this); // returns an URI
InetSocketAddress address = server.asInetSocketAddress();
// returns an Address
Path path = server.path(this, "/index.html"); // returns a Path
// all of them are validated small bijection objects with very few and precise
// responsibilities

4.3 Reifying Associative Arrays

Problem

You have anemic associative (key/value) arrays representing real-world objects.

Solution

Use arrays for rapid prototyping and use objects for serious business.

Discussion

Rapid Prototyping

Rapid prototyping is used in product development to quickly create working prototypes to validate with the end user. This technique allows designers and engineers to test and refine a design before creating consistent, robust, and elegant clean code.

Associative arrays are a handy way to represent anemic objects. If you encounter them in the code, this recipe will help you to reify the concept and replace them. Having rich objects is beneficial to clean code so you can fail fast, maintain integrity, avoid code duplication, and gain cohesion.

Many people suffer from primitive obsession and believe this is overdesign. Designing software is about making decisions and comparing trade-offs. The performance argument is invalid nowadays since modern virtual machines can efficiently deal with small short-lived objects.

Here is an example of anemic and primitive obsession code:

$coordinate = array('latitude'=>1000, 'longitude'=>2000);
// They are just arrays. A bunch of raw data

This is more accurate according to the bijection concept:

final class GeographicCoordinate {
    function __construct($latitudeInDegrees, $longitudeInDegrees) {
        $this->longitude = $longitudeInDegrees;
        $this->latitude = $latitudeInDegrees;
    }
}

$coordinate = new GeographicCoordinate(1000, 2000);
// Should throw an error since these values don’t exist on Earth

You need to have objects that are valid from inception:

final class GeographicCoordinate {
    function __construct($latitudeInDegrees, $longitudeInDegrees) {
        $this->longitude = $longitudeInDegrees;
        $this->latitude = $latitudeInDegrees;
    }
}

$coordinate = new GeographicCoordinate(1000, 2000);
// Should throw an error since these values don't exist on Earth

final class GeographicCoordinate {
    function __construct($latitudeInDegrees, $longitudeInDegrees) {
        if (!$this->isValidLatitude($latitudeInDegrees)) {
            throw new InvalidLatitudeException($latitudeInDegrees);
            }
         $this->longitude = $longitudeInDegrees;
         $this->latitude = $latitudeInDegrees;
        }
    }
}

$coordinate = new GeographicCoordinate(1000, 2000);
// throws an error since these values don't exist on Earth

There is an obscure small object (see Recipe 4.1, “Creating Small Objects”) to model the latitude:

final class Latitude {
    function __construct($degrees) {
        if (!$degrees->between(-90, 90)) {
            throw new InvalidLatitudeException($degrees);
        }
    }
}

final class GeographicCoordinate {

    function distanceTo(GeographicCoordinate $coordinate) { }
    function pointInPolygon(Polygon $polygon) { }
}

// Now you are in the geometry world (and not in the world of arrays anymore).
// You can safely do many exciting things.

When creating objects, you must not think of them as data. This is a common misconception. You should stay loyal to the concept of bijection and discover real-world objects.

4.4 Removing String Abuses

Problem

You have too many parsing, exploding, regex, string comparison, substring search, and other string manipulation functions.

Solution

Use real abstractions and real objects instead of accidental string manipulation.

Discussion

Don’t abuse strings. Favor real objects. Find absent protocols to distinguish them from strings. This code does a lot of primitive string manipulations:

$schoolDescription = 'College of Springfield';

preg_match('/[^ ]*$/', $schoolDescription, $results);
$location = $results[0]; // $location = 'Springfield'.

$school = preg_split('/[\s,]+/', $schoolDescription, 3)[0]; //'College'

You can convert the code to a more declarative version:

class School {
    private $name;
    private $location;

    function description() {
        return $this->name . ' of ' . $this->location->name;
    }
}

By finding objects present in the MAPPER, your code is more declarative, more testable, and can evolve and change faster. You can also add constraints to the new abstractions. Using strings to map real objects is a primitive obsession and premature optimization symptom (see Chapter 16, “Premature Optimization”). Sometimes the strings version is a bit more performant. If you need to decide between applying this recipe and making low-level manipulations, always create real usage scenarios and find conclusive and significant improvements.

4.5 Reifying Timestamps

Problem

Your code relies on timestamps while you just need sequencing.

Solution

Don’t use timestamps for sequencing. Centralize and lock your time issuer.

Discussion

Managing timestamps across different time zones and with heavy concurrency scenarios is a well-known problem. Sometimes, you might confuse the problem of having sequential and ordered items with the (possible) solution of timestamping them. As always, you need to understand the essential problems to solve before guessing accidental implementations.

A possible solution is to use a centralized authority or some complex decentralized consensus algorithms. This recipe challenges the need for timestamps when you just need an ordered sequence. Timestamps are very popular in many languages and are ubiquitous. You need to use native timestamps just to model timestamps if you find them in the bijection.

Here are some problems with timestamps:

import time

# ts1 and ts2 stores the time in seconds
ts1 = time.time()
ts2 = time.time() # might be the same!!

Here’s a better solution without timestamps since you just need sequencing behavior:

numbers = range(1, 100000)
# create a sequence of numbers and use them with a hotspot

# or
sequence = nextNumber()

4.6 Reifying Subsets as Objects

Problem

You model objects in a superset domain and have lots of validation duplication.

Solutions

Create small objects and validate a restricted domain.

Discussion

Subsets are a special case of a primitive obsession smell. The subset objects are present on the bijection, therefore you must create them in your simulator. Also, when you try to create an invalid object, it should break immediately, following the fail fast principle (see Chapter 13, “Fail Fast”). Some examples of subset violations include: emails are a subset of strings, valid ages are a subset of real numbers, and ports are a subset of integers. Invisible objects have rules you need to enforce at a single point.

Take this example:

validDestination = "destination@example.com"
invalidDestination = "destination.example.com"
// No error is thrown

Here’s a better domain restriction:

public class EmailAddress {
    public String emailAddress;

    public EmailAddress(String address) {
        string expressions = @"^\w+([-+.']\w+)*@\w+([-.]\w+)*\.\w+([-.]\w+)*$";
        if (!Regex.IsMatch(email, expressions) {
          throw new Exception('Invalid email address');
        }
        this.emailAddress = address;
    }
}

destination = new EmailAddress("destination@example.com");

This solution should not be confused with the anemic Java version. You need to be loyal to the bijection of the real world.

4.7 Reifying String Validations

Problem

You are validating a subset of strings.

Solution

Search for missing domain objects when validating strings and reify them.

Discussion

Serious software has lots of string validations. Often, they are not in the correct places, leading to fragile and corrupt software. The simple solution is to build only real-world and valid abstractions:

// First Example: Address Validation
class Address {
  function __construct(string $emailAddress) {
     // String validation on Address class violates
     // Single Responsibility Principle
     $this->validateEmail($emailAddress);
     // ...
   }

  private function validateEmail(string $emailAddress) {
    $regex = "/[a-zA-Z0-9_-.+]+@[a-zA-Z0-9-]+.[a-zA-Z]+/";
    // Regex is a sample / It might be wrong
    // Emails and Urls should be first class objects

    if (!preg_match($regex, $emailAddress))
    {
      throw new Exception('Invalid email address ' . emailAddress);
    }   
  }
}

// Second Example: Wordle

class Wordle {
  function validateWord(string $wordleword) {
    // Wordle word should be a real world entity. Not a subset of Strings
  }
 }

Here’s a better solution:

// First Example: Address Validation
class Address {
  function __construct(EmailAddress $emailAddress) {
     // Email is always valid / Code is cleaner and not duplicated
     // ...
   }
}

class EmailAddress {
  // You can reuse this object many times avoiding copy-pasting
  string $address;
  private function __construct(string $emailAddress) {
    $regex = "/[a-zA-Z0-9_-.+]+@[a-zA-Z0-9-]+.[a-zA-Z]+/";
    // Regex is a sample / It might be wrong
    // Emails and Urls are first class objects

    if (!preg_match($regex, $emailAddress))
    {
      throw new Exception('Invalid email address ' . emailAddress);
    }  
    $this->address = $emailAddress;
  }
}

// Second Example: Wordle

class Wordle {
  function validateWord(WordleWord $wordleword) {
    // Wordle word is a real world entity. Not a subset of string
  }
 }

class WordleWord {
  function __construct(string $word) {
    // Avoid building invalid Wordle words
    // For example length != 5
  }
 }

Single-Responsibility Principle

The single-responsibility principle states that every module or class in a software system should have responsibility over a single part of the functionality provided by the software and that responsibility should be entirely encapsulated by the class. In other words, a class should have only one reason to change.

The small objects are hard to find. But they follow the fail fast principle when you try to create invalid objects. The new reified object also follows the single-responsibility principle and the don’t repeat yourself principle. Having these abstractions forces you to implement specific behavior that is already available in the objects it encapsulates. For example, a WordleWord is not a String, but you might need some functions.

Don’t Repeat Yourself Principle

The don’t repeat yourself (DRY) principle states that software systems should avoid redundancy and repetition of code. The goal of the DRY principle is to improve the maintainability, flexibility, and understandability of software by reducing the amount of duplicated knowledge, code, and information.

A counterargument about efficiency avoiding these new indirections is a sign of premature optimization unless you have concrete evidence of a substantial penalty with real-use scenarios from your customers. Creating these new small concepts keeps the model loyal to the bijection and ensures your models are always healthy.

SOLID Principles

SOLID is a mnemonic that stands for five principles of object-oriented programming. They were defined by Robert Martin and are guidelines and heuristics, not rigid rules. They are defined in the related chapters:

4.8 Removing Unnecessary Properties

Problem

You have objects created based on their properties instead of their behavior.

Solution

Remove accidental properties. Add the needed behavior and then add accidental properties to support the defined behavior. 

Discussion

Many programming schools tell you to quickly identify object parts and then build functions around them. Such models are usually coupled and less maintainable than the ones created based on the desired behavior. Following YAGNI’s premise (see Chapter 12, “YAGNI”), you’ll find many times you don’t need these attributes.

Whenever they want to model a person or an employee, junior programmers or students add an attribute id or name without thinking if they are really going to need them. You need to add attributes “on demand” when there’s enough behavioral evidence. Objects are not “data holders.”

This is a classic teaching example:

class PersonInQueue
  attr_accessor :name, :job

  def initialize(name, job)
    @name = name
    @job = job
  end
end

If you start focusing on the behavior, you will be able to build better models:

class PersonInQueue

  def moveForwardOnePosition
    # implement protocol
  end
end

An amazing technique for behavior discovery is test-driven development, where you are forced to start iterating the behavior and protocol and deferring accidental implementation as much as you can.

Test-Driven Development

Test-driven development (TDD) is a software development process that relies on the repetition of a very short development cycle: first, the developer writes a failing automated test case that defines a desired improvement or new behavior, then produces minimal production code to pass that test and finally refactors the new code to acceptable standards. One of the main goals of TDD is to make the code easier to maintain by ensuring that it is well-structured and follows good design principles. It also helps to catch defects early in the development process, since each new piece of code is tested as soon as it is written.

4.9 Creating Date Intervals

Problem

You have to model real-world intervals and you have information like “from date” and “to date,” but no invariants like: “from date should be lower than to date.”

Solution

Reify this small object and honor the MAPPER rule.

Discussion

This recipe presents a very common abstraction that you might miss and has the same problems you saw in this chapter’s other recipes: missing abstractions, duplicated code, unenforced invariant (see Recipe 13.2, “Enforcing Preconditions”), primitive obsession, and violation of the fail fast principle. The restriction “from date should be lower than to date” means that the starting date of a certain interval should occur before the ending date of the same interval.

The “from date” should be a date that comes earlier in time than the “to date.” This restriction is in place to ensure that the interval being defined makes logical sense and that the dates used to define it are in the correct order. You know it but forget to create the Interval object. Would you create a Date as a pair of three integer numbers? Certainly not.

Here is an anemic example:

val from = LocalDate.of(2018, 12, 9)
val to = LocalDate.of(2022, 12, 22)

val elapsed = elapsedDays(from, to)

fun elapsedDays(fromDate: LocalDate, toDate: LocalDate): Long {
    return ChronoUnit.DAYS.between(fromDate, toDate)
}

// You need to apply this short function
// or the inline version many times in your code
// You don't check fromDate to be less than toDate
// You can make accounting numbers with a negative value

After you reify the Interval object:

data class Interval(val fromDate: LocalDate, val toDate: LocalDate) {
    init {
        if (fromDate >= toDate) {
            throw IllegalArgumentException("From date must be before to date")
        }
        // Of course the Interval must be immutable
        // By using the keyword 'data'
    }

    fun elapsedDays(): Long {
        return ChronoUnit.DAYS.between(fromDate, toDate)
    }
}

val from = LocalDate.of(2018, 12, 9)
val to = LocalDate.of(2002, 12, 22)

val interval = Interval(from, to) // Invalid

This is a primitive obsession smell and is related to how you model things. If you find software with missing simple validations, it certainly needs some reification.

Get Clean Code Cookbook now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.