Rule 1. As Simple as Possible, but No Simpler

Programming is hard.

I’m guessing you’ve already figured this out. Anyone who picks up and reads a book titled The Rules of Programming is likely to both:

  • Be able to program, at least a little

  • Be frustrated that it’s not easier than it is

There are lots of reasons why programming is hard, and lots of strategies to try to make it easier. This book looks at a carefully selected subset of common ways to screw things up and Rules to avoid those mistakes, all drawn from my many years of making mistakes of my own and coping with the mistakes of others.

There’s an overall pattern to the Rules, a common theme that most of them share. It’s best summarized with a quote from Albert Einstein describing the goals of a physical theorist: “As simple as possible, but no simpler.”1 By that, Einstein meant that the best physical theory was the simplest one that completely described all observable phenomena.

Recasting that idea to programming, the best way to implement a solution to any problem is the simplest one that meets all the requirements of that problem. The best code is the simplest code.

Imagine that you’re writing code to count the number of bits set in an integer. There are lots of ways to do this. You might use bit trickery2 to zero out a bit at a time, counting how many bits get zeroed out:

int countSetBits(int value)
{
    int count = 0;

    while (value)
    {
        ++count;
        value = value & (value - 1);
    }

    return count;
}

Or you might opt for a loop-free implementation, with bit shifting and masking to count the bits in parallel:

int countSetBits(int value)
{
    value = ((value & 0xaaaaaaaa) >> 1) + (value & 0x55555555);
    value = ((value & 0xcccccccc) >> 2) + (value & 0x33333333);
    value = ((value & 0xf0f0f0f0) >> 4) + (value & 0x0f0f0f0f);
    value = ((value & 0xff00ff00) >> 8) + (value & 0x00ff00ff);
    value = ((value & 0xffff0000) >> 16) + (value & 0x000ffff);

    return value;
}

Or you might just write the most obvious code possible:

int countSetBits(int value)
{
    int count = 0;

    for (int bit = 0; bit < 32; ++bit)
    {
        if (value & (1 << bit))
            ++count;
    }

    return count;
}

The first two answers are clever…and I don’t mean that as a compliment.3 A quick glance isn’t enough to figure out how either example actually works—they each have a little morsel of “Wait…what?” code tucked inside the loop. With a little bit of thought you can figure out what’s going on, and seeing the trick is kind of fun. But untangling things takes some effort.

And that’s with a head start! I told you what the functions did before showing the code, and the function names hammer their purpose home. If you hadn’t known that the code counted set bits, untangling either of the first two examples would have been even more work.

That’s not the case for the final answer. It’s obvious that it’s counting the bits set. It’s as simple as possible, but no simpler, and that makes it better than the first two answers.4

Measuring Simplicity

There are many ways to think about what makes code simple.

You might decide to measure simplicity based on how easy code is for someone else on your team to understand. If a randomly chosen colleague can read through a bit of code and understand it with no effort, then the code is appropriately simple.

Or you might decide to measure simplicity by how easy it is to create the code—not just the time to type it, but the time it takes to get the code fully functional and bug-free as well.5 Complicated code takes a while to get right; simple code is easier to get across the finish line.

These two measures have a lot of overlap, of course. Code that’s easy to write tends to be easy to read, too. And there are other valid measures of complexity you might use:

How much code is written

Simpler code tends to be shorter, though it’s possible to jam a lot of complexity into a single line of code.

How many ideas are introduced

Simple code tends to build on the concepts that everyone on your team knows; it doesn’t introduce new ways of thinking about problems or new terminology.

How much time it takes to explain

Simple code is easy to explain—in a code review, it’s obvious enough that the reviewer zooms right through. Complicated code takes explanation.

Code that seems simple against one measure will seem simple against the other measures as well. You just need to choose which of the measures provides the clearest focus for your work—but I recommend starting with ease of creation and ease of comprehension. If you focus on getting easy-to-read code working quickly, you’re creating simple code.

…But No Simpler

It’s better for code to be simpler, but it still needs to solve the problem it intends to solve.

Imagine that you’re trying to count how many ways there are to climb a ladder with some number of rungs, given the stipulation that you gain one, two, or three rungs with each step. If the ladder has two rungs, there are two ways to climb it—either you step on the first rung or not. Similarly, there are four ways to climb a three-rung ladder—step on the first rung, step on the second rung, step on the first and second rungs, or step directly to the top rung. A four-rung ladder can be climbed in seven ways, a five-rung ladder in thirteen ways, and so on.

You can write simple code to calculate this recursively:

int countStepPatterns(int stepCount)
{
    if (stepCount < 0)
        return 0;
    
    if (stepCount == 0)
        return 1;

    return countStepPatterns(stepCount - 3) + 
           countStepPatterns(stepCount - 2) + 
           countStepPatterns(stepCount - 1);
}

The basic idea is that any journey up the ladder has to step to the top rung from one of the three rungs below it. Adding the number of ways to climb to each of those rungs gives the number of ways to climb to the top rung. After that, it’s just a matter of figuring out the base cases. The preceding code allows negative step counts as a base case to make the recursion simpler.

Unfortunately, this solution doesn’t work. Well, it does work, at least for small stepCount values, but countStepPatterns(20) takes about twice as long to complete as countStepPatterns(19). Computers are really fast, but exponential growth like this will catch up to that speed. In my test, the example code started getting pretty slow once stepCount got into the twenties.

If you’re expected to count the number of ways up longer ladders, then this code is too simple. The core issue is that all of the intermediate results of countStepPatterns are recalculated over and over, and this leads to exponential run times. A standard answer to this is memoization—hanging onto the calculated intermediate values and reusing them, as in this example:

int countStepPatterns(unordered_map<int, int> * memo, int rungCount)
{
    if (rungCount < 0)
        return 0;
    
    if (rungCount == 0)
        return 1;

    auto iter = memo->find(rungCount);
    if (iter != memo->end())
        return iter->second;

    int stepPatternCount = countStepPatterns(memo, rungCount - 3) + 
                           countStepPatterns(memo, rungCount - 2) + 
                           countStepPatterns(memo, rungCount - 1);

    memo->insert({ rungCount, stepPatternCount });
    return stepPatternCount;
}

int countStepPatterns(int rungCount)
{
    unordered_map<int, int> memo;
    return countStepPatterns(&memo, rungCount);
}

With memoization in place, each value is calculated once and inserted in the hash map. Subsequent calls find the calculated value in the hash map in roughly constant time, and the exponential growth goes away. The memoized code is a smidgen more complicated, but it doesn’t hit a performance wall.

You might also decide to use dynamic programming, trading off a bit of conceptual complexity for better code simplicity:

int countStepPatterns(int rungCount)
{
    vector<int> stepPatternCounts = { 0, 0, 1 };

    for (int rungIndex = 0; rungIndex < rungCount; ++rungIndex)
    {
        stepPatternCounts.push_back(
            stepPatternCounts[rungIndex + 0] +
            stepPatternCounts[rungIndex + 1] +
            stepPatternCounts[rungIndex + 2]);
    }

    return stepPatternCounts.back();
}

This approach runs quickly enough, too, and it’s even simpler than the memoized recursive version.

Sometimes It’s Better to Simplify the Problem Rather than the Solution

The problems in the original, recursive version of countStepPatterns appeared for longer ladders. The simplest code worked perfectly well for small numbers of rungs, but hit an exponential performance wall for large numbers of rungs. Later versions avoided the exponential wall at the cost of slightly more complexity…but they soon run into a different problem.

If I run the previous code to calculate countStepPatterns(36), I get the right answer, 2,082,876,103. Calling countStepPatterns(37), though, returns –463,960,867. That’s clearly not right!

That’s because the version of C++ I’m using stores integers as signed 32-bit values, and calculating countStepPatterns(37) overflowed the available bits. There are 3,831,006,429 ways to climb a 37-rung ladder, and that number is too big to fit in a signed 32-bit integer.

So maybe the code is still too simple. It seems reasonable to expect countStepPatterns to work for all ladder lengths, right? C++ doesn’t have a standard solution for really big integers, but there are (many) open source libraries that implement various flavors of arbitrary-precision integers. Or given a few hundred lines of code, you could implement your own:

struct Ordinal
{
public:

    Ordinal() :
        m_words()
        { ; }
    Ordinal(unsigned int value) :
        m_words({ value })
        { ; }

    typedef unsigned int Word;

    Ordinal operator + (const Ordinal & value) const
    {
        int wordCount = max(m_words.size(), value.m_words.size());

        Ordinal result;
        long long carry = 0;

        for (int wordIndex = 0; wordIndex < wordCount; ++wordIndex)
        {
            long long sum = carry + 
                            getWord(wordIndex) + 
                            value.getWord(wordIndex);

            result.m_words.push_back(Word(sum));
            carry = sum >> 32;
        }

        if (carry > 0)
            result.m_words.push_back(Word(carry));

        return result;
    }

protected:

    Word getWord(int wordIndex) const
    {
        return (wordIndex < m_words.size()) ? m_words[wordIndex] : 0;
    }

    vector<Word> m_words;
};

Dropping Ordinal into the last example in place of int produces exact answers for longer ladders:

Ordinal countStepPatterns(int rungCount)
{
    vector<Ordinal> stepPatternCounts = { 0, 0, 1 };

    for (int rungIndex = 0; rungIndex < rungCount; ++rungIndex)
    {
        stepPatternCounts.push_back(
            stepPatternCounts[rungIndex + 0] +
            stepPatternCounts[rungIndex + 1] +
            stepPatternCounts[rungIndex + 2]);
    }

    return stepPatternCounts.back();
}

So…problem solved? With the introduction of Ordinal, an exact answer can be calculated for much longer ladders. Sure, adding a few hundred lines of code to implement Ordinal isn’t great, especially given that the actual countStep​Pat⁠terns function is only 14 lines long, but isn’t that the price that must be paid to correctly solve the problem?

Probably not. If there isn’t a simple solution to a problem, interrogate the problem before you accept a complicated solution. Is the problem you’re trying to solve actually the problem that needs solving? Or are you making unnecessary assumptions about the problem that are complicating your solution?

In this case, if you’re actually counting step patterns for real ladders, you can probably assume a maximum ladder length. If the maximum ladder length is, say, 15 rungs, then any of the solutions in this section are perfectly adequate, even the naive recursive example presented first. Add an assert to one of them noting the built-in limits of the function and declare victory:

int countStepPatterns(int rungCount)
{
    // NOTE (chris) can't represent the pattern count in an int
    // once we get past 36 rungs...

    assert(rungCount <= 36);

    vector<int> stepPatternCounts = { 0, 0, 1 };

    for (int rungIndex = 0; rungIndex < rungCount; ++rungIndex)
    {
        stepPatternCounts.push_back(
            stepPatternCounts[rungIndex + 0] +
            stepPatternCounts[rungIndex + 1] +
            stepPatternCounts[rungIndex + 2]);
    }

    return stepPatternCounts.back();
}

Or if supporting really long ladders is required—handling the inspection ladder for a wind turbine, say—then would an approximate count of steps be enough? Probably, and if so it’s easy to replace integers with floating-point values. So easy that I’m not even going to show the code.

Look, everything overflows eventually. Solving the extreme boundary cases for a problem will always lead to an overly complicated solution. Don’t get trapped into solving the strictest definition of a problem. It’s much better to have a simple solution for the part of the problem that actually needs to be solved instead of a complicated solution to a broader definition of the problem. If you can’t simplify the solution, try to simplify the problem.

Simple Algorithms

Sometimes it’s a poor choice of algorithm that adds complexity to your code. There are lots of ways to solve any particular problem, after all, some more complicated than others. Simple algorithms lead to simple code. The problem is that the simple algorithm isn’t always obvious!

Say you’re writing code to sort a deck of cards. An obvious approach is to simulate the riffle shuffle you likely learned as a kid—split the deck into two piles, then fan them into each other, giving the card on each side a roughly equal chance of ending up next into the recombined deck. Repeat until the deck is shuffled.6

That might look like this:

vector<Card> shuffleOnce(const vector<Card> & cards)
{
    vector<Card> shuffledCards;

    int splitIndex = cards.size() / 2;
    int leftIndex = 0;
    int rightIndex = splitIndex;

    while (true)
    {
        if (leftIndex >= splitIndex)
        {
            for (; rightIndex < cards.size(); ++rightIndex)
                shuffledCards.push_back(cards[rightIndex]);

            break;
        }
        else if (rightIndex >= cards.size())
        {
            for (; leftIndex < splitIndex; ++leftIndex)
                shuffledCards.push_back(cards[leftIndex]);

            break;
        }
        else if (rand() & 1)
        {
            shuffledCards.push_back(cards[rightIndex]);
            ++rightIndex;
        }
        else
        {
            shuffledCards.push_back(cards[leftIndex]);
            ++leftIndex;
        }
    }

    return shuffledCards;
}

vector<Card> shuffle(const vector<Card> & cards)
{
    vector<Card> shuffledCards = cards;

    for (int i = 0; i < 7; ++i)
    {
        shuffledCards = shuffleOnce(shuffledCards);
    }

    return shuffledCards;
}

This simulated-riffle-shuffle algorithm works, and the code I’ve written here is a fairly simple implementation of that algorithm. You’ll have to expend a little energy making sure that all of the index checks are correct, but it’s not too bad.

But there are simpler algorithms to shuffle a deck of cards. For instance, you could build a shuffled deck one card at a time. On each iteration, take a new card and swap it with a random card in that iteration’s deck. You can do this in place, actually:

vector<Card> shuffle(const vector<Card> & cards)
{
    vector<Card> shuffledCards = cards;

    for (int cardIndex = shuffledCards.size(); --cardIndex >= 0; )
    {
        int swapIndex = rand() % (cardIndex + 1);
        swap(shuffledCards[swapIndex], shuffledCards[cardIndex]);
    }

    return shuffledCards;
}

By the simplicity measures introduced earlier, this version is superior. It took less time to write.7 It’s easier to read. It’s less code. It’s easier to explain. It’s simpler and better—not because of the code, but because of the better choice of algorithm.

Don’t Lose the Plot

Simple code is easy to read—and the simplest code can be read straight through, top to bottom, just like reading a book. Programs aren’t books, though. It’s easy to end up with code that’s hard to follow if the flow through the code isn’t simple. When code is convoluted, when it makes you jump from place to place to follow the flow of execution, it’s much harder to read.

Convoluted code can result from trying too hard to express each idea in exactly one place. Take the riffle-shuffle code from earlier. The bits of code that deal with the right and left piles of cards look pretty similar to each other. The logic to move one card or a series of cards to the shuffled pile could be split into separate functions, then called from shuffleOnce:

void copyCard(
    vector<Card> * destinationCards, 
    const vector<Card> & sourceCards,
    int * sourceIndex)
{
    destinationCards->push_back(sourceCards[*sourceIndex]);
    ++(*sourceIndex);
}

void copyCards(
    vector<Card> * destinationCards, 
    const vector<Card> & sourceCards,
    int * sourceIndex,
    int endIndex)
{
    while (*sourceIndex < endIndex)
    {
        copyCard(destinationCards, sourceCards, sourceIndex);
    }
}

vector<Card> shuffleOnce(const vector<Card> & cards)
{
    vector<Card> shuffledCards;

    int splitIndex = cards.size() / 2;
    int leftIndex = 0;
    int rightIndex = splitIndex;

    while (true)
    {
        if (leftIndex >= splitIndex)
        {
            copyCards(&shuffledCards, cards, &rightIndex, cards.size());
            break;
        }
        else if (rightIndex >= cards.size())
        {
            copyCards(&shuffledCards, cards, &leftIndex, splitIndex);
            break;
        }
        else if (rand() & 1)
        {
            copyCard(&shuffledCards, cards, &rightIndex);
        }
        else
        {
            copyCard(&shuffledCards, cards, &leftIndex);
        }
    }

    return shuffledCards;
}

The previous version of shuffleOnce was readable top-to-bottom; this one isn’t. That makes it harder to read. While reading through the shuffleOnce code you run into the copyCard or copyCards function. Then you have to track down those functions, figure out what they do, pop back to the original function, then match the arguments passed from shuffleOnce to your new understanding of copyCard or copyCards. That’s a lot harder than just reading the loops in the original shuffleOnce.

So, the don’t-repeat-yourself version of the function took more time to write8 and is harder to read. It’s more code, too! The attempt to remove duplication made the code more complicated, not simpler.

Obviously, there’s something to be said for reducing the amount of duplication in your code! But it’s important to recognize that there’s a cost to removing the duplication—and for small amounts of code and simple ideas, it’s better to just leave duplicate copies. The code will be easier to write and easier to read.

One Rule to Rule Them All

Many of the remaining Rules in this book will return to this theme of simplicity, of keeping code as simple as possible but no simpler.

At its heart, programming is a struggle with complexity. Adding new functionality often means making the code more complicated—and as code gets more complicated, it gets harder and harder to work with, and progress gets slower and slower. Eventually, you can reach an event horizon, where any attempt to move forward—to fix a bug or add a feature—causes as many problems as it solves. Further progress is effectively impossible.

In the end, it will be complexity that kills your project.

That means effective programming is about delaying the inevitable. Add as little complexity as possible as features are added and bugs fixed. Look for opportunities to remove complexity, or architect things so that new features don’t add much to the overall complexity of the system. Create as much simplicity as possible in how your team works together.

If you’re diligent, you can delay the inevitable indefinitely. I wrote the first lines of Sucker Punch code 25 years ago, and the codebase has continuously evolved since then. There’s no end in sight—our code is wildly more complicated than it was 25 years ago, but we’ve been able to stay in control of that complexity, and we’re still able to make effective progress.

We’ve been able to manage complexity, and so can you. Stay sharp, remember that complexity is the ultimate enemy, and you’ll do well.

1 He almost certainly didn’t use those exact words—posterity has done Einstein a favor by sharpening up his aphorisms. The closest match in the written record is “It can scarcely be denied that the supreme goal of all theory is to make the irreducible basic elements as simple and as few as possible without having to surrender the adequate representation of a single datum of experience.” So, pretty much the same thing, just not as snappy. Also the actual Einstein quote is a little bit long for a Rule title.

2 Apologies to all non-C++ programmers for the bit twiddling in the next three examples. I promise the rest of the book is light on bitwise operations.

3 In a plausible alternate universe, this Rule is named “Cleverness Is Not a Virtue.”

4 Modern processors have a dedicated instruction to count the number of bits set in a value—popcnt on x86 processors, for instance, which executes in a single cycle. You can also get carried away with SIMD instructions to count lots of bits even faster than popcnt. But all of these approaches are hard to understand, and which instructions are supported depend on exactly which processor you have. I’d rather see the simplest countSetBits, unless there was a really, really good reason to use something more complicated.

5 Bug-free within experimental error, of course. There are always bugs you haven’t found yet.

6 A deck of cards is pretty well randomized after seven riffle shuffles. After four or five riffle shuffles the deck isn’t randomized at all. And yes, my family gets annoyed with how many times I shuffle a deck of cards before dealing the next hand. “We’re here to play cards, Chris, not watch you shuffle.” A little knowledge is a dangerous thing.

7 As measured experimentally; your mileage may vary. I got a little cute with the indexes and conditions when writing the riffle-shuffle code, and it took a few tries to get working. The random selection code worked the first time.

8 Again, experimentally determined. Took a few tries to get it to compile, actually, as I wavered between using pointers and references.

Get The Rules of Programming now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.