Chapter 4. Primitive Obsession
The obsession with primitives is a primitive obsession.
Rich Hickey
4.0 Introduction
Many software engineers think that software is about “moving data around”; object-oriented schools and textbooks focus on the data and attributes when teaching about modeling the real world. This was a cultural bias taught in universities during the ’80s and ’90s. Industry trends pushed engineers to create entity-relationship diagrams (ERDs) and reason about the business data instead of focusing on the behavior.
Data is more relevant than ever. Data science is growing and the world revolves around data. You need to create a simulator to manage and protect data and expose behavior while hiding information and accidental representation to avoid coupling. The recipes from this chapter will help you identify small objects and hide accidental representation. You will discover many cohesive small objects and reuse them in many different contexts.
Cohesion
Cohesion is a measure of the degree to which the elements within a single software class or module work together to achieve a single, well-defined purpose. It refers to how closely related the objects are to each other and to the overall goal of the module. You can see high cohesion as a desirable property in software design since the elements within a module are closely related and work together effectively to achieve a specific goal.
4.1 Creating Small Objects
Solution
Find responsibilities for small objects in the MAPPER and reify them.
Discussion
Since the early days of computing, engineers map all they see to the familiar primitive data types such as String, Integer, and Collection. Mapping to those data types sometimes violates abstraction and fail fast principles. The Person’s name has different behaviors than a string as you can see in the following example:
public
class
Person
{
private
final
String
name
;
public
Person
(
String
name
)
{
this
.
name
=
name
;
}
}
The concept of names is reified:
public
class
Name
{
private
final
String
name
;
public
Name
(
String
name
)
{
this
.
name
=
name
;
// Name has its own creation rules, comparison, etc.
// Might be different than a string
}
}
public
class
Person
{
private
final
Name
name
;
public
Person
(
Name
name
)
{
// Name is created as a valid one,
// you don't need to add validations here
this
.
name
=
name
;
}
}
Take the five-letter word from the Wordle game as an example. A Wordle word does not have the same responsibilities as a char(5)
and does not map on the bijection. If you want to create a Wordle game, you will see a bijection between a Wordle word different from a String
or c
har(5)
, since they don’t have the same responsibilities. For example, it is not a String
’s responsibility to find how many matches it has to the secret Wordle word. And it is not the responsibility of a Wordle word to concatenate.
Wordle
Wordle is a popular online word-guessing game where you have six attempts to guess a five-letter word selected by the game. You make each guess by entering a five-letter word, and the game indicates which letters are correct and in the correct position (marked with a green square) and which letters are correct but in the wrong position (marked with a yellow square).
In a very small number of mission-critical systems, there is a trade-off between abstraction and performance. But to avoid premature optimization (see Chapter 16, “Premature Optimization”), you should rely on modern computers and virtual machine optimizations and, as always, you need to stick to evidence in real-world scenarios. Finding small objects is a very hard task, requiring experience to do a good job and avoid overdesign. There’s no silver bullet in choosing how and when to map something.
No Silver Bullet
The “no silver bullet” concept is a phrase coined by computer scientist and software engineering pioneer Fred Brooks in his 1986 essay “No Silver Bullet: Essence and Accidents of Software Engineering”. Brooks argues that there is no single solution or approach that can solve all of the problems or significantly improve the productivity and effectiveness of software development.
4.2 Reifying Primitive Data
Solution
Use small objects instead of primitive ones.
Discussion
Suppose you’re building a web server:
int
port
=
8080
;
InetSocketAddress
in
=
open
(
"example.org"
,
port
);
String
uri
=
urifromPort
(
"example.org"
,
port
);
String
address
=
addressFromPort
(
"example.org"
,
port
);
String
path
=
pathFromPort
(
"example.org"
,
port
);
This naive example has many problems. It violates the “Tell, don’t ask” principle (see Recipe 3.3, “Removing Setters from Objects”) and the fail fast principle. Moreover, it does not follow the MAPPER design rule and violates the subset principle. There is code manipulation duplicated everywhere that is needed to use these objects since it does not clearly separate the “what” from the “how.”
The industry is very lazy when it comes to creating small objects and also separating the what and the how since it takes some extra effort to discover such abstractions. It’s important to look at the protocol and behavior of small components and forget trying to understand the internals of how things work. A bijection-compliant solution might be:
Port
server
=
Port
.
parse
(
this
,
"www.example.org:8080"
);
// Port is a small object with responsibilities and protocol
Port
in
=
server
.
open
(
this
);
// returns a port, not a number
URI
uri
=
server
.
asUri
(
this
);
// returns an URI
InetSocketAddress
address
=
server
.
asInetSocketAddress
();
// returns an Address
Path
path
=
server
.
path
(
this
,
"/index.html"
);
// returns a Path
// all of them are validated small bijection objects with very few and precise
// responsibilities
4.3 Reifying Associative Arrays
Solution
Use arrays for rapid prototyping and use objects for serious business.
Discussion
Rapid Prototyping
Rapid prototyping is used in product development to quickly create working prototypes to validate with the end user. This technique allows designers and engineers to test and refine a design before creating consistent, robust, and elegant clean code.
Associative arrays are a handy way to represent anemic objects. If you encounter them in the code, this recipe will help you to reify the concept and replace them. Having rich objects is beneficial to clean code so you can fail fast, maintain integrity, avoid code duplication, and gain cohesion.
Many people suffer from primitive obsession and believe this is overdesign. Designing software is about making decisions and comparing trade-offs. The performance argument is invalid nowadays since modern virtual machines can efficiently deal with small short-lived objects.
Here is an example of anemic and primitive obsession code:
$coordinate
=
array
(
'latitude'
=>
1000
,
'longitude'
=>
2000
);
// They are just arrays. A bunch of raw data
This is more accurate according to the bijection concept:
final
class
GeographicCoordinate
{
function
__construct
(
$latitudeInDegrees
,
$longitudeInDegrees
)
{
$this
->
longitude
=
$longitudeInDegrees
;
$this
->
latitude
=
$latitudeInDegrees
;
}
}
$coordinate
=
new
GeographicCoordinate
(
1000
,
2000
);
// Should throw an error since these values don’t exist on Earth
You need to have objects that are valid from inception:
final
class
GeographicCoordinate
{
function
__construct
(
$latitudeInDegrees
,
$longitudeInDegrees
)
{
$this
->
longitude
=
$longitudeInDegrees
;
$this
->
latitude
=
$latitudeInDegrees
;
}
}
$coordinate
=
new
GeographicCoordinate
(
1000
,
2000
);
// Should throw an error since these values don't exist on Earth
final
class
GeographicCoordinate
{
function
__construct
(
$latitudeInDegrees
,
$longitudeInDegrees
)
{
if
(
!
$this
->
isValidLatitude
(
$latitudeInDegrees
))
{
throw
new
InvalidLatitudeException
(
$latitudeInDegrees
);
}
$this
->
longitude
=
$longitudeInDegrees
;
$this
->
latitude
=
$latitudeInDegrees
;
}
}
}
$coordinate
=
new
GeographicCoordinate
(
1000
,
2000
);
// throws an error since these values don't exist on Earth
There is an obscure small object (see Recipe 4.1, “Creating Small Objects”) to model the latitude:
final
class
Latitude
{
function
__construct
(
$degrees
)
{
if
(
!
$degrees
->
between
(
-
90
,
90
))
{
throw
new
InvalidLatitudeException
(
$degrees
);
}
}
}
final
class
GeographicCoordinate
{
function
distanceTo
(
GeographicCoordinate
$coordinate
)
{
}
function
pointInPolygon
(
Polygon
$polygon
)
{
}
}
// Now you are in the geometry world (and not in the world of arrays anymore).
// You can safely do many exciting things.
When creating objects, you must not think of them as data. This is a common misconception. You should stay loyal to the concept of bijection and discover real-world objects.
4.4 Removing String Abuses
Solution
Use real abstractions and real objects instead of accidental string manipulation.
Discussion
Don’t abuse strings. Favor real objects. Find absent protocols to distinguish them from strings. This code does a lot of primitive string manipulations:
$schoolDescription
=
'College of Springfield'
;
preg_match
(
'/[^ ]*$/'
,
$schoolDescription
,
$results
);
$location
=
$results
[
0
];
// $location = 'Springfield'.
$school
=
preg_split
(
'/[\s,]+/'
,
$schoolDescription
,
3
)[
0
];
//'College'
You can convert the code to a more declarative version:
class
School
{
private
$name
;
private
$location
;
function
description
()
{
return
$this
->
name
.
' of '
.
$this
->
location
->
name
;
}
}
By finding objects present in the MAPPER, your code is more declarative, more testable, and can evolve and change faster. You can also add constraints to the new abstractions. Using strings to map real objects is a primitive obsession and premature optimization symptom (see Chapter 16, “Premature Optimization”). Sometimes the strings version is a bit more performant. If you need to decide between applying this recipe and making low-level manipulations, always create real usage scenarios and find conclusive and significant improvements.
4.5 Reifying Timestamps
Discussion
Managing timestamps across different time zones and with heavy concurrency scenarios is a well-known problem. Sometimes, you might confuse the problem of having sequential and ordered items with the (possible) solution of timestamping them. As always, you need to understand the essential problems to solve before guessing accidental implementations.
A possible solution is to use a centralized authority or some complex decentralized consensus algorithms. This recipe challenges the need for timestamps when you just need an ordered sequence. Timestamps are very popular in many languages and are ubiquitous. You need to use native timestamps just to model timestamps if you find them in the bijection.
Here are some problems with timestamps:
import
time
# ts1 and ts2 stores the time in seconds
ts1
=
time
.
time
()
ts2
=
time
.
time
()
# might be the same!!
Here’s a better solution without timestamps since you just need sequencing behavior:
numbers
=
range
(
1
,
100000
)
# create a sequence of numbers and use them with a hotspot
# or
sequence
=
nextNumber
()
4.6 Reifying Subsets as Objects
Solutions
Create small objects and validate a restricted domain.
Discussion
Subsets are a special case of a primitive obsession smell. The subset objects are present on the bijection, therefore you must create them in your simulator. Also, when you try to create an invalid object, it should break immediately, following the fail fast principle (see Chapter 13, “Fail Fast”). Some examples of subset violations include: emails are a subset of strings, valid ages are a subset of real numbers, and ports are a subset of integers. Invisible objects have rules you need to enforce at a single point.
Take this example:
validDestination
=
"destination@example.com"
invalidDestination
=
"destination.example.com"
// No error is thrown
Here’s a better domain restriction:
public
class
EmailAddress
{
public
String
emailAddress
;
public
EmailAddress
(
String
address
)
{
string
expressions
=
@
"^\w+([-+.']\w+)*@\w+([-.]\w+)*\.\w+([-.]\w+)*$"
;
if
(
!
Regex
.
IsMatch
(
,
expressions
)
{
throw
new
Exception
(
'
Invalid
address
'
);
}
this
.
emailAddress
=
address
;
}
}
destination
=
new
EmailAddress
(
"destination@example.com"
);
This solution should not be confused with the anemic Java version. You need to be loyal to the bijection of the real world.
4.7 Reifying String Validations
Solution
Search for missing domain objects when validating strings and reify them.
Discussion
Serious software has lots of string validations. Often, they are not in the correct places, leading to fragile and corrupt software. The simple solution is to build only real-world and valid abstractions:
// First Example: Address Validation
class
Address
{
function
__construct
(
string
$emailAddress
)
{
// String validation on Address class violates
// Single Responsibility Principle
$this
->
validateEmail
(
$emailAddress
);
// ...
}
private
function
validateEmail
(
string
$emailAddress
)
{
$regex
=
"/[a-zA-Z0-9_-.+]+@[a-zA-Z0-9-]+.[a-zA-Z]+/"
;
// Regex is a sample / It might be wrong
// Emails and Urls should be first class objects
if
(
!
preg_match
(
$regex
,
$emailAddress
))
{
throw
new
Exception
(
'Invalid email address '
.
emailAddress
);
}
}
}
// Second Example: Wordle
class
Wordle
{
function
validateWord
(
string
$wordleword
)
{
// Wordle word should be a real world entity. Not a subset of Strings
}
}
Here’s a better solution:
// First Example: Address Validation
class
Address
{
function
__construct
(
EmailAddress
$emailAddress
)
{
// Email is always valid / Code is cleaner and not duplicated
// ...
}
}
class
EmailAddress
{
// You can reuse this object many times avoiding copy-pasting
string
$address
;
private
function
__construct
(
string
$emailAddress
)
{
$regex
=
"/[a-zA-Z0-9_-.+]+@[a-zA-Z0-9-]+.[a-zA-Z]+/"
;
// Regex is a sample / It might be wrong
// Emails and Urls are first class objects
if
(
!
preg_match
(
$regex
,
$emailAddress
))
{
throw
new
Exception
(
'Invalid email address '
.
emailAddress
);
}
$this
->
address
=
$emailAddress
;
}
}
// Second Example: Wordle
class
Wordle
{
function
validateWord
(
WordleWord
$wordleword
)
{
// Wordle word is a real world entity. Not a subset of string
}
}
class
WordleWord
{
function
__construct
(
string
$word
)
{
// Avoid building invalid Wordle words
// For example length != 5
}
}
Single-Responsibility Principle
The single-responsibility principle states that every module or class in a software system should have responsibility over a single part of the functionality provided by the software and that responsibility should be entirely encapsulated by the class. In other words, a class should have only one reason to change.
The small objects are hard to find. But they follow the fail fast principle when you try to create invalid objects. The new reified object also follows the single-responsibility principle and the don’t repeat yourself principle. Having these abstractions forces you to implement specific behavior that is already available in the objects it encapsulates. For example, a WordleWord
is not a String
, but you might need some functions.
Don’t Repeat Yourself Principle
The don’t repeat yourself (DRY) principle states that software systems should avoid redundancy and repetition of code. The goal of the DRY principle is to improve the maintainability, flexibility, and understandability of software by reducing the amount of duplicated knowledge, code, and information.
A counterargument about efficiency avoiding these new indirections is a sign of premature optimization unless you have concrete evidence of a substantial penalty with real-use scenarios from your customers. Creating these new small concepts keeps the model loyal to the bijection and ensures your models are always healthy.
SOLID Principles
SOLID is a mnemonic that stands for five principles of object-oriented programming. They were defined by Robert Martin and are guidelines and heuristics, not rigid rules. They are defined in the related chapters:
-
Single-responsibility principle (see Recipe 4.7, “Reifying String Validations”)
-
Open-closed principle (see Recipe 14.3, “Reifying Boolean Variables”)
-
Liskov substitution principle (see Recipe 19.1, “Breaking Deep Inheritance”)
-
Interface segregation principle (see Recipe 11.9, “Breaking Fat Interfaces”)
-
Dependency inversion principle (see Recipe 12.4, “Removing One-Use Interfaces”)
4.8 Removing Unnecessary Properties
Solution
Remove accidental properties. Add the needed behavior and then add accidental properties to support the defined behavior.
Discussion
Many programming schools tell you to quickly identify object parts and then build functions around them. Such models are usually coupled and less maintainable than the ones created based on the desired behavior. Following YAGNI’s premise (see Chapter 12, “YAGNI”), you’ll find many times you don’t need these attributes.
Whenever they want to model a person or an employee, junior programmers or students add an attribute id or name without thinking if they are really going to need them. You need to add attributes “on demand” when there’s enough behavioral evidence. Objects are not “data holders.”
This is a classic teaching example:
class
PersonInQueue
attr_accessor
:name
,
:job
def
initialize
(
name
,
job
)
@name
=
name
@job
=
job
end
end
If you start focusing on the behavior, you will be able to build better models:
class
PersonInQueue
def
moveForwardOnePosition
# implement protocol
end
end
An amazing technique for behavior discovery is test-driven development, where you are forced to start iterating the behavior and protocol and deferring accidental implementation as much as you can.
Test-Driven Development
Test-driven development (TDD) is a software development process that relies on the repetition of a very short development cycle: first, the developer writes a failing automated test case that defines a desired improvement or new behavior, then produces minimal production code to pass that test and finally refactors the new code to acceptable standards. One of the main goals of TDD is to make the code easier to maintain by ensuring that it is well-structured and follows good design principles. It also helps to catch defects early in the development process, since each new piece of code is tested as soon as it is written.
4.9 Creating Date Intervals
Solution
Reify this small object and honor the MAPPER rule.
Discussion
This recipe presents a very common abstraction that you might miss and has the same problems you saw in this chapter’s other recipes: missing abstractions, duplicated code, unenforced invariant (see Recipe 13.2, “Enforcing Preconditions”), primitive obsession, and violation of the fail fast principle. The restriction “from date should be lower than to date” means that the starting date of a certain interval should occur before the ending date of the same interval.
The “from date” should be a date that comes earlier in time than the “to date.” This restriction is in place to ensure that the interval being defined makes logical sense and that the dates used to define it are in the correct order. You know it but forget to create the Interval
object. Would you create a Date
as a pair of three integer numbers? Certainly not.
Here is an anemic example:
val
from
=
LocalDate
.
of
(
2018
,
12
,
9
)
val
to
=
LocalDate
.
of
(
2022
,
12
,
22
)
val
elapsed
=
elapsedDays
(
from
,
to
)
fun
elapsedDays
(
fromDate
:
LocalDate
,
toDate
:
LocalDate
):
Long
{
return
ChronoUnit
.
DAYS
.
between
(
fromDate
,
toDate
)
}
// You need to apply this short function
// or the inline version many times in your code
// You don't check fromDate to be less than toDate
// You can make accounting numbers with a negative value
After you reify the Interval
object:
data
class
Interval
(
val
fromDate
:
LocalDate
,
val
toDate
:
LocalDate
)
{
init
{
if
(
fromDate
>=
toDate
)
{
throw
IllegalArgumentException
(
"From date must be before to date"
)
}
// Of course the Interval must be immutable
// By using the keyword 'data'
}
fun
elapsedDays
():
Long
{
return
ChronoUnit
.
DAYS
.
between
(
fromDate
,
toDate
)
}
}
val
from
=
LocalDate
.
of
(
2018
,
12
,
9
)
val
to
=
LocalDate
.
of
(
2002
,
12
,
22
)
val
interval
=
Interval
(
from
,
to
)
// Invalid
This is a primitive obsession smell and is related to how you model things. If you find software with missing simple validations, it certainly needs some reification.
Get Clean Code Cookbook now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.