Richard Moulds on harnessing entropy for a more secure world
The O’Reilly Security Podcast: Randomness, our dependence on entropy for security and privacy, and rating entropy sources for more effective encryption.
In this episode, I talk with Richard Moulds, vice president of strategy and business development at Whitewood Encryption. We discuss whether random number generation is as random as some might think and the implications that has on securing systems with encryption, how to harness entropy for better randomness, and emerging standards for evaluating and certifying the quality of entropy sources.
Here are some highlights:
Randomness: The linchpin of encryption
When people think about cryptography, which is a broad subject, they tend to think about encryption. They think about the algorithms we use to encrypt our data, the keys we use, and how to keep these keys secret. A key is just a random number. Generally speaking, crypto in encryption applications gets these random numbers from the operating system. There are standard calls that you can make as a software developer to get a random number. We’re focused on researching how good operating systems are at actually generating random numbers. If random numbers stop being truly random, then keys become predictable and the value proposition of encryption and cryptography in general fades away. It starts to become a real issue as attackers get even stronger computers, notably quantum computers, which would make it devastatingly easy to break the crypto in encryption algorithms.
These algorithms—essentially pseudorandom number generators—have been defined and certified for years, but they’re dependent on the availability of entropy. A few hundred bits of perfect randomness can be used by a pseudorandom number generator to generate hundreds of megabits or gigabits of actual random numbers that might be consumed by applications. Provided they can find entropy in the real world somewhere, the algorithms can use that to randomize their internal random number generators. Think of it like a pack of playing cards. You’ve got the process of dealing a pack of cards, and you’ve got the process of shuffling a pack of cards. The pseudorandom number generators in the operating system like Linux, for example, are the process of dealing the deck of cards. Entropy is the process of shuffling that deck of cards, randomizing the pseudorandom number generators that are in the operating system.
Programming entropy into secure systems: A necessary oxymoron
Entropy is fundamentally a physical property, a measure of randomness reflected in the physical world. Hardware developers and system architects have been trying to find ways of scavenging random entropy from existing sources because entropy doesn’t naturally exist in the digital world, where everything is programmed. Potential sources are everywhere, though quality varies. For example, you could calculate the entropy in the text of Romeo and Juliet and find that it has some entropy—although not very much—because it’s using English words that have certain grammatical instructions and a plot that makes some sense. But entropy in the Chinese language is three times as high as entropy in English just because of the way the characters are calculated and strung together to form sentences. Everything’s got entropy; the question is how you can find a source of randomness that is unpredictable and can be kept secret from attackers. For the systems that we have today, we derive entropy from things like the timing between keystrokes on the keyboard, mouse movements, the arrival of packets on a network, or the timing jitter for processes running on the CPU. None of these are perfectly random. They’re not periodic, and they’re not completely predictable, but they have some entropy.
That calls into question several things: how good are these sources? How good’s that distillation process? What happens if these sources dry up? Your phone has a radio antennae and a keyboard and a gyroscope and location detectors and cameras and microphones—all manner of potential entropy sources—but the issue is, when you start running crypto applications for that phone in a data center, particularly a virtualized data center, there’s not much going on: no users, no noise, not even a hard drive. Worse still, if you snapshot a virtual machine and make one hundred copies to scale out a web server, then each virtual machine would have the same randomness that exists in that one particular instance. You’d actually end up replicating whatever minimal levels of entropy existed across these virtual machines, thereby undermining the randomness and, thus, the encryption.
When random isn’t as random as we think
Developers have generally assumed that the operating system can scavenge enough entropy to do its job when making random numbers. Very few people worry about this issue. If you think about an IT stack, you’ve got a developer who writes applications, you’ve got an operating system, you’ve got the physical hardware, and you’ve got the physical environment. It’s fine if somebody owns that whole stack and can point cameras at lava lamps or put up microphones to capture background noise and can build a system that has sufficient entropy. It sounds funny, but people have tried all manner of crazy things to scavenge randomness. That’s okay as long as you’re in control of that whole system, because when you’re not, eventually your keys can start to become predictable.
The issue is that, unfortunately, you can’t measure the quality of your random numbers. If you could measure the output of /dev/urandom and say, “Okay, this number’s good—you should use it. This number’s not so good—you shouldn’t use it,” that would be fine, but it’s not possible to do that. All a developer can do (when they don’t control the full stack) is just consume what’s being delivered. There are horror stories of various studies that have tested for the reuse and duplication of keys on the internet. Even though in principle, with the length of keys that we use in modern systems, the chances should be infinitesimal that you’d ever see the same key twice, these studies have found millions of keys that are the same.
Improving the status quo: NIST standards for entropy sources
Years ago, the National Institute of Standards and Technology (NIST) standardized pseudorandom number generators because they’re just algorithms. If you know what number you put on the input side, you can predict what number you get on the output side. Those have been around as standardized and certified for years. But the difficulties in measuring the quality of entropy sources have made entropy sourcing a tough standard to write. NIST is now on their second draft of a standard, to actually define how to measure the entropy in random number generators and to, ultimately, certify entropy sources as they apply to seeding pseudorandom number generators in operating systems for crypto applications. I think this will be a big step forward. There have been a lot of suggestions that back doors have already been built into systems to weaken random number generators. It turns out the random number generator is one of the perfect places to put a back door because it’s essentially undetectable.
This NIST standards draft is called SP 800-90B. There’s actually a suite of these -90 standards: -90A covers pseudorandom number generators, and that’s already out and finished; -90B covers entropy sources; and -90C covers the various architectures for stringing entropy sources and PRNGs together. Once it’s complete, this trio of standards will create a mechanism for certifying products and assigning them an entropy score, like a car’s average miles per gallon. The entropy score will give security architects and system administrators a way to measure the quality of their cryptosystems and the ability to attest that the encryption and key generation that’s happening in their environment is up to grade.