Chapter 1. Cloud Security, the Collaborative Game

Alone we can do so little; together we can do so much.

Helen Keller

A call from MI5¹ was never a good thing. Monday morning was already a bleak mix of the usual English summer weather, which was caught between two minds as to whether overcast or rain was the order of the day. Happily ensconced in our beige office and sipping canteen coffee to keep awake, the call did the job that the coffee was sorely failing to do.

The point being made was crystal clear and cut through the Monday malaise like a cold shower:

“You have a breach. We are aware of it. Don’t fix it!”

We nodded along until the word “don’t,” when our surprise could probably have been felt all the way to Aberdeen.

“Don’t fix it?”

“That’s right. We are using the breach to track, trace, and gather intel on a much larger concern. Hold fire for now.”

“Ok, yes sir. We won’t fix it.”

That was new. Normally, you might simply rush to fix, or even find, a breach, but we were being ordered to not be that effective. That was fairly easy, as it turned out, and the officer from MI5 needn’t have worried. Fix it? We didn’t even know about it! This breach, noticed by the powers-that-be in Thames House,² wasn’t just news to our developers; it was news to the people running our systems and, yes, even to us, the security engineers. And it was our job to know about these things. We were the point people, the ones who had the cross-situation visibility and control—we were the brains of the security operation. But it wasn’t just that we weren’t in control—we were blind.

Naturally, our next step was to gather our resources in an orderly manner and invoke our collaborative, all-hands-on-deck protocol. Like a well-oiled machine, everyone from development and operations, and our good selves, would trip the right operational lever and we’d all descend into a war room that would have enough context, intelligence, and direct action to make Houston’s Mission Control Center jealous.

Except that simply wasn’t the case. Like many of our friends in financial services security teams across the globe, our first step, other than panic, was to send out some emails, and then everyone headed off to look at what they thought was important. The development teams started rooting through their codebases, operations and support personnel started pouring through their logs, and we, the responsible adults, headed off to the coffee machine to discuss what policies we’d created that might have been ignored.³

Dodging the blame—that was the first instinct. Make sure it wasn’t us. Of course, if you’d asked us, we’d have claimed this was some sort of divide-and-conquer strategy. Each to their own area of expertise; can’t expect everyone to know everything. Except it sadly wasn’t really that. Less divide and conquer, more isolate and ruminate.

Worse, no one could find what was wrong. The code looked fine enough to the developers, the operations and support folks couldn’t see any distressing patterns in the logs or network traffic dashboards, and we security engineers felt happy that our security policies, while largely ignored, were utterly perfect as far as we could see. In any other circumstance, we’d turn off the alert as a false positive and go back to our regular business. But this alert had been from MI5, and they hadn’t offered us a handy “Close Incident, Nothing to See Here” button. They’d be back to give us the green light to fix things, and we’d better have an idea what was broken, at least before things got really embarrassing.

The Cloud Native Security Game

Security is a game of two teams. On one side, you have the attackers. On the other, you have you, your cloud security engineers, your developers, and your support and operations teams. You’d think that might come with a numerical advantage, but no, it likely does not, and, worse, with greater numbers comes greater opportunity for something to slip through the gaps. As Fred Brooks and Melvin Conway have been talking about for decades, more people can mean more problems.⁴ A small team of well-trained and focussed individuals can outplay an army of disparate, uncoordinated troops—just ask any average special forces unit.

The attackers are active participants, players with malicious intent. That’s what differentiates security from other concerns, such as reliability. When it comes to reliability, you’re working against the elements, the weather, sometimes yourselves. In security, there’s nothing quite as eternal or passive as Murphy’s law;⁵ there is always another player who is actively seeking out your security weaknesses and trying to exploit what you have in some way or another. It’s not just the universe working against you; there’s a person or, more likely these days, a group of people trying to out-think and out-play you.

How a Play Is Made: The Anatomy of an Attack

The rules of the game are simple. The attacking team will look for your weaknesses, and they will then look to follow a time-honored dance to see how far they can push a weakness in order to turn your misstep into something valuable for them, and potentially highly embarrassing for you.

If the malicious actors want a way in, well, in a cloud native environment, there’s a lot of scope to accidentally leave a door ajar. Fortunately, their modus operandi⁶ follows a common set of patterns. Let’s get to know who those actors are in a little more detail, and then we can explore their preferred paths into your systems and distill that into a common cloud native security attack “play.” Let’s get to know our enemy.

Meet the Attackers: Actors and Vectors

If you were to start with specific people and motivations for attacking cloud native systems, your list would be as long as it would be unhelpful. From international espionage, through mining crypto, to someone just trying to steal phone numbers to sell, the types of people are as varied as the reasons they want access to your systems. The sheer variety can be a bit overwhelming to consider. Fortunately, it’s useful to consider individual cases at times, for example, to explore how your systems might be resilient to an attack on a security game day.⁷ Using that lens, there is a much smaller set of types of actor to be aware of:⁸

External attackers: People trying to get in to access your resources, i.e., data, processing, networks
Internal attackers: Someone who has successfully made the jump from being outside, as an external attacker, to being inside your system
Malicious internal attackers: Someone inside your system who has some level of privilege, perhaps legitimately, as part of their job, but is misusing that privilege
Inadvertent internal attackers: Someone who accidentally causes problems from inside the system, perhaps with legitimate privileges
Your own applications and infrastructure: The foothold and stepping stones attackers can leverage to get to their final destination

Each actor will show a level of authentication, proving (or pretending to prove) who they are with some credentials and authorization. Those privileges translate into permissions to do what the actors are supposed to be allowed to do. For example, external attackers are likely to start their journey by trying to establish credentials, that is, finding a way to become an internal attacker, a process called gaining initial access,⁹ while your applications and internal actors will authenticate themselves to establish legitimate credentials and privileges, which could then be misused, hopefully inadvertently.

Note

Since we use them every day when we log into our devices, we don’t always find it necessary to really think about what authentication, authorization, and credentials, with their permissions and privileges, actually mean. Familiarity breeds complacency, but these concepts are critical when understanding how bad actors try to penetrate, operate in, and exploit our systems.

Simply put, and ignoring the nuances of the various technologies that can be involved for the sake of simplicity, authentication is the process of establishing that an actor is who they say they are according to some exchanges, details, or credentials. Authorization is the process of applying the correct permissions to the correct credentials, giving the actor the correct privileges to perform actions in the system.

The Attacker’s Moves

A malicious external or internal actor is someone who has a method to get access to your systems and then do what they want. In simple terms, they look to perform four actions:

Gain initial access
Establish a foothold
Escalate privilege
Do what they want…

To do these things, an attacker has five things to manipulate, called the five Cs here:¹⁰

Cloud
Clusters
Containers
Code
Continuous integration and delivery (CI/CD) pipelines

From the very moment code is written, there is an opportunity for a malicious internal or external actor to begin their journey. This means security has to be involved across the whole software development lifecycle as well as being deep in the technical stack.

Let’s look at each of the attacker’s moves in more detail.

Gaining initial access

The first step for a malicious actor is to get some sort of access to your system. Like a free climber trying to find a chink in the sheer rock face, this is the hunt for a chink in your system’s armor through which more progress can be made.

There are four really common ways that a malicious actor can gain initial access to your cloud native system:

Misconfiguration: The most common way that an actor can gain their initial access is through misconfiguration of the security of your infrastructure and networks. The speed of change has led to some great technologies, such as infrastructure-as-code (IaC) languages and tools, making it easier and faster to create and evolve your infrastructure—network, servers, even containers—in the cloud. However, they often optimize for a great user experience over being secure by default. This is where IaC scanning tools can help you out. IaC code can be scanned as soon as it is written to seek out any accidental inclusion of insecure-by-default settings. With these IaC code scanning tools, you can secure your code by default instead, helping you to avoid leaving doors open for external and internal actors to get their fingertips inside the system from the word go.
Insecure workloads: Beyond misconfiguration, you may be inadvertently running insecure workloads that expose multiple security vulnerabilities that can enable others to gain access to the system. Containers, in particular, may contain operating systems and associated tools that go well beyond what they need to do their job. Containers are often over-provisioned and over-privileged,¹¹ and their base images often contain operating system utilities and capabilities that can benefit an attacker should they gain shell access.
Manipulating the supply chain: Your cloud native system’s supply chain—all the packages and artifacts needed to build your application and system, as well as your CI/CD pipeline itself—is fast becoming the most popular attack vector. An attacker can manipulate the supply chain to gain access, perhaps spoofing a third-party system to find a loophole, creating their first toehold into your world. See Chapter 5: Securing Your Supply Chain for more on the different types of attacks and how you can secure this popular attack surface.
Exposed and stolen credentials: If you’ve never accidentally checked in a password along with a piece of code into a private repository, we suspect you’re in the minority. You’re probably also familiar with the pain of having to walk back and cleanse your version control system’s history of all remnants of that commit. The reason you’re doing that work is because your secrets, typically credentials, are supposed to be… secret, and source code is a terrible place to locate those secrets because it’s easy for them to be accidentally exposed to the world.¹² But source code repositories are not the only locations in which secrets shouldn’t be kept, and they’re not the only way those secrets have a way of making themselves public knowledge. It’s no good having great walls and securely locked doors in your system if everyone has a key to your locks.

Establishing the foothold

Once the actor has a toehold, then they are hungry for more. That initial access is not enough to accomplish their goal; it is only a stepping stone to the next level. Their next step in establishing their position may be downloading or triggering an already-present exploit, manipulating the runtime environment, or infiltrating insecure workloads. The actor will then look to find a way to execute some commands that expedite them to the next stage: getting the privileges they need.

Escalating privilege

The last stepping stone is for the actor to assume the privileges they need to do whatever it is they really want with your systems. Through account hijacking and user account compromise, or even the creation of new types of user accounts, the actor is looking to complete the puzzle needed to then do the work they set out to do.

Executing the attack

With the right privileges, the door is open for the actor to get whatever it is they came for. Whether that be to add new backdoors, take control of your servers, copy data and steal information, or even repurpose your processing to mine cryptocurrency, once they have the power, they set out to use it, and the actor’s modus operandi is complete.

Log4j Episode I: A Zero-Day Vulnerability Emerges

When the Log4j vulnerability emerged as a zero-day,¹⁴ it was as real as the Heartbleed Bug from back in 2014, when a vulnerability in the OpenSSL project was so deadly that it got its own website and logo. Although the exploitation method was very different, they shared a kinship in how widespread the vulnerability was in production and, as a result, how urgent and real the requirement for immediate action was for organizations.

The result of the exploitation, given the appropriate moniker “Log4Shell,” allowed remote code execution within the application host or container, which could result in an attacker creating a remote shell. This left an attacker already through the walls of our security castle, nestled as a resident within and attempting to covertly advance towards the gold. This style of vulnerability is a perfect example of how a platform security solution with a holistic approach is ideal.

You’re likely familiar with the way many TV and film tropes portray multiple police forces chasing the same crime, only to cloud the findings, duplicate efforts, claim jurisdiction, brag that they have better investigative efforts, and ultimately delay any actionable results. The same can happen when a multitude of niche security solutions cross boundaries, create noise with similar but different alerts, and muddy the root cause and best location to both stop an attack and prevent it from happening again.

We’ll revisit the Log4j vulnerability throughout the book, but for now, let’s consider it as an example to understand the anatomy of an attack.

The National Institute of Standards and Technology (NIST) National Vulnerability Database (NVD) describes it in its list of common vulnerabilities and exposures (CVEs): CVE-2021-44228. Using the Common Vulnerability Scoring System (CVSS) 3.0, it was assigned a score of 9.8.

In detail, Log4Shell is a critical remote code execution (RCE) vulnerability in the Apache Log4j library, versions 2.0-beta9 to 2.14.1. This means an attacker could execute their own commands or code within the victim’s environment and potentially take complete control of a vulnerable system.

This is significant because Log4j is a very widely used logging framework, especially in cloud native environments, making it potentially easy for attackers to find vulnerable systems. Cloud native applications built on a microservice architecture can lead to vulnerable libraries being used across an application. This means there’s a higher chance of Log4j being present somewhere as a direct dependency within the application or hiding as a transitive dependency in third-party code. It is even possible that multiple different vulnerable versions were present in the same application.

Although the exploit method of a Log4Shell attack may seem relatively simple on the surface, it does make some rather grand assumptions about the lack of security defenses. A Cloud Native Application Protection Platform (CNAPP) provides considerable defense in depth to prevent not only a Log4Shell attack but attacks that run in a parallel track, now and in the future.

In the forthcoming chapters, we will return to this discussion to detail the many ways in which a CNAPP is, to our cloud native knight, a shining suit of multi-layered armor (see Figure 1-1). Please refer to the preface for a detailed breakdown of each initialism.

Figure 1-1. The anatomy of a CNAPP’s armor: CSPM, ASPM, CIEM, WAAS, CI/CD Security, CWPP, WAAP, WAF, IaC security, DSPM, SCA, SAST, DAST

Broad, Deep, and Complex: The Cloud Native Security Game Board

Now that you’ve met the attackers, let’s look at the game board itself: your applications and everything they need to do their work. Securing your cloud native applications is a challenge because when you’re cloud native, things aren’t just complicated, they’re complex.¹⁵ Worse, that complexity comes in the form of three dimensions: your stack, the lifecycle of your application, and the practices and tooling that make the speed, convenience, and scale of cloud native application development possible.

First, a Pinch of Structure: The Cloud Native Stack

Oh, for the days when you had an application, an operating system, maybe three tiers, and a small set of network connections. Coupled with a yearly release cycle, you knew where you were with those systems. Mainly living a life that was slow, frustrating, and not necessarily any more secure. But at least the attack surface seemed smaller, and certainly the speed of change was.

With cloud native applications, the number of moving parts has exploded. You have important architectural styles, such as microservices, that encourage the creation of many independently evolving and autonomous components. Each of those parts requires underlying hosts, with containers becoming the deployment packaging choice de rigueur, along with even more fine-grained functions becoming common, too.

To sensibly manage all those components at runtime without losing your sanity, you have a plethora of tools at your fingertips. To start with, you have your continuous integration and deployment pipeline and the supply chain of third-party artifacts that it depends on to build and ship your system. How will your application reach its production destination, and how will it do it securely? How can you use the pipeline itself to secure your application as it is built, and how do you know the pipeline itself is secure?

To help you manage all your runtime cloud resources, you’ll employ IaC and configuration-as-code (CaC) languages and tools. These tools can provide scalable and repeatable provisioning of cloud resources, but do you know exactly what they are doing? They can be powerful and convenient, but what opinions do they apply when you don’t specify a detail, and are those defaults secure?

Finally, to run all of your containers, you can employ platforms such as Kubernetes with service meshes to surface and manage the interactions between services. You will likely then employ an API gateway or two to manage the ingress and egress from your systems. You might also choose to deploy to multiple locations composed of private, public, and hybrid clouds. Each may have their own vendors and commercial arrangements, their own service levels, and their own vulnerabilities.

All of these cloud native tools and techniques give you more scale, flexibility, and convenience than ever before. The complexity is worth the price. The downside is that it all comes at a cost. With more flexibility and convenience comes more vulnerability.

Second, a Smattering of Speed: Lifecycles

Cloud native software development lifecycles enable enormous speed of change, which means that cloud native applications rarely sit still for a minute. Continuous delivery encourages you not to take even that minute for granted.

Change is continuous and the norm in cloud native application development. Your speed to production, the speed at which you can get your changes in the hands of your users reliably, securely, and at scale, is the ROI of cloud native. This means that everything is automated, everything is code, and any friction on your ability to ship now should be under question.

For these reasons, cloud native technologies emphasize convenience over security. The defaults on your automation and assets are set at a level that makes the tooling feel as powerful and fast as possible, not as secure as possible. This is a strategy that works astoundingly well for adoption—as it delivers a great developer experience—but at the cost of numerous insecure defaults that are easy to ignore, unless you’re looking to exploit them.

Note

When Kubernetes and Docker were first launched, their default configurations were set to open, not secure. The security stance was purposefully open to encourage the best experience and adoption with the tools possible, and then gradually to introduce more secure, but potentially more frustrating, defaults over time.

This convenience-over-security pattern is very common in new cloud native technologies hoping to make an impact in the market by attracting adoption. This is something to be conscious of when adopting any new technology. Security isn’t an afterthought, but it will be something that likely needs to be enabled after initial adoption forays are complete.

Standard protocols of multiple gates, manual sign-offs, extended testing periods, once-a-year security testing, and audits just don’t fit in a cloud native world. They oppose the very point of going cloud native, i.e., speed of change. The value they add, though, and the security they attempt to bring, are as crucial as ever. You want the speed and you want the reliability and security. You want to have your cake and get to eat it, too.

To Season, Add Some Open Source

Open source is critical in cloud native application development. A huge amount of the world’s systems now run on open source software, especially those in the cloud. This includes everything from the firmware up. This is arguably a great thing, but when looked at through a security lens, there is much to make you nervous.

Do you know who contributes to an essential library or framework? Do you know all the dependencies that are brought into your application transitively through those libraries and frameworks? The terms library and framework are largely synonymous with dependency. Do you have a full, deep Software Bill of Materials (SBOM) that you can work with from a security perspective?¹⁶

Hidden Risks in the Branches of Dependency Trees

Transitive dependencies are the dependencies of the dependencies that you know you have. Or they are the dependencies that your dependencies depend upon. Any clearer? No. Ok.

Think of your application code as the two cards sitting on top of a pyramid-shaped house of cards, as shown in Figure 1-2.

Figure 1-2. Your application and infrastructure code as the top two cards in a pyramid-shaped house of cards

They’re right at the top of the pyramid where you can see them. Right, now pretend you can only see the cards those two immediately touch. Those are your direct dependencies, or the dependencies you know you have because you, probably, explicitly specified them.

Then, there’s all the cards underneath those direct dependency cards. They’re the transitive dependencies that your explicit, known dependencies need to do their work. They are their direct dependencies. And their direct dependencies’ direct dependencies. And direct dependencies’ direct dependencies’ direct dependencies. Et cetera, et cetera, ad infinitum (not quite) et ad nauseam (probably, from fear).

It’s not just your direct, explicit dependencies that matter; rather, it’s all the dependencies in your house of cards. Every one of those dependencies can come with a stock of vulnerabilities and, without drilling down through all those transitive layers, you won’t even know what’s down there! You need to peel the onion down to the core set of dependencies that don’t have any dependencies of their own. It’s still a house of cards, but now you can see and work with it. Now you need to scan that visible house of cards for vulnerabilities. All of it, collectively. Your house of cards could be a nest of vipers, all waiting to bite.¹⁷

At the end of the day, it’s a very difficult job to have deep insight into all of these variables. This means the security of your dependencies is on you. Whether you’re using a simple library, or embracing a whole framework for your cloud native applications, your code has these dependencies at runtime, and they make up part of your security responsibility.

This is especially the case when it comes to containers. Do you know what is running inside your container when you depend on a base image? Do you know what its default configuration is, or if it has recently changed? DevOps, along with containerization, has increased the breadth of your responsibility, from including just your code and libraries, through to the very foundations of the operating system that are packaged in your container images. You have more power and choice than ever before, but can you be secure with it and retain the speed of development and delivery you need?

Open Source: Easy Button for Growth, but at What Risk?

What’s the challenge with open source? Two things: popularity, and gaining popularity through insecurity-by-default.

There can be a lot of reasons why an open source technology may be insecure. For a start, many eyeballs does not a secure tech guarantee. Just because tens, hundreds, or even hundreds of thousands of people have access to the code, and maybe have even read it, does not mean that it is secure.

Which leads to the first problem with open source: it’s often hugely popular, and so can be used everywhere. A security vulnerability in one of your organization’s in-house libraries means you might have some vulnerabilities to fix on a couple of systems, but a vulnerability in an open source library can mean every Java application on the internet might be insecure.¹⁸ The sheer proliferation of a popular open source component, applied in countless different configurations, can mean attackers have an unrivaled ability to attack a known vulnerability, while equally countless organizations are left scrambling to find, patch, or replace the problem package, at the mercy of the original open source project developer community’s priorities.¹⁹

Also, in order for an open source project to become popular, the original developers often opt for convenience of adoption over security. This means that especially newer open source projects can be insecure by default. The open source technology is insecure in its basic form, the form you are most likely to encounter at first. After your first, convenient (but insecure) baby steps with the new open source technology, you will then, ideally, look to make it secure, which can require anything from making some small config changes right through to applying an advanced degree in cybersecurity and reading thousands of pages of online docs for the magic incantation that enables you to secure the castle.

There’s an interesting conflict between making a technology easy to adopt, and making it secure. On the one hand, you want your technology, whether it be an open source library, a new way of packaging and running applications, or an entire platform of tools, to be as easy as possible to adopt. That usually means it has to be as simple of an experience to grab and use it as possible. This is especially true for open source projects where you may live or die depending on how easily you can pick up and use in five minutes or less.

On the flip side is the desire for your technology to be secure, but that often comes at the price of it being harder to grab and use quickly. A more secure library, service, or platform is going to require that more hoops are jumped through in order to get things up and running. You can’t just clone and play; you have to clone, tweak, find the right combinations to secure things, and then you might be able to play sometime this week. Not the winning adoption experience.

As a result, many open source technologies are insecure by default. Insecure means fewer hoops to jump through, and more chance of quickly providing the heady mix of dopamine that can turn your little open source project into a juggernaut used by the world. Then, when that rush is a nagging memory, you can gradually let the user discover the bad news: their great new discovery is insecure, and to overcome that, they have more work to do.

The good news is that the dopamine might actually be enough for the adopter of the tech to do the work and get things into secure shape. But not always. More often than not, insecure factory defaults find their merry way to the lands of production, much to the enjoyment and celebration of malicious actors everywhere.

Open source is often insecure by default while it is seeking a path from obscurity to stardom. Sometimes, as in the case with Docker, over time, the technology matures, and is so universally useful that it can ask more of the people adopting it. Secure by default can start to emerge. But not always; not even often.

This is why you need scanning tools that can look for those tenacious and persistent defaults, and remind you early to change things up for something more secure. You can’t rely on the defaults, and often the most worrying thing can be when something “just works.”

Your (Insecure) Dish Is Ready: From Shallow to Defense in Depth

The menagerie of options in technology and architecture alone renders cloud native application development complicated. Combined with the dominance of open source, where your development teams are no longer your development teams, and the need for speed and the love of convenience in the software delivery lifecycle, you have a recipe for insecurity par excellence.

Addressing that starts with adjusting where you think security begins and ends. The complications and complexities of cloud native application development mean that the boundaries are more blurred than ever, and achieving defense in depth won’t allow you to rely simply on one fortified wall.²⁰

Note

If you have one wall, then you have a defense. If you have multiple walls, each requiring authentication in order to gain access to secure resources, then you have defense in depth. One compromise is not all it takes.

The Attack Surface Is Broad

The scope where an actor can gain access, establish a foothold, and then obtain privileges is exacerbated by the sheer complexity necessary to provide the ROI of being cloud native. Figure 1-3 shows the breadth and depth of the cloud native landscape that is open to being compromised.

At the point of coding, which now includes coding the infrastructure through IaC tools, vulnerabilities can be included through proprietary code and packages, as well as open source tools. Misconfiguration can occur and malware can even be included.

Moving across the lifecycle through continuous integration, vulnerable or malicious images can be constructed through the build of your container images. Then, these images are promoted to production through continuous deployment and instantiation through a cluster orchestrator into the layers of applications, clusters, servers, containers, and serverless functions, all running on hosted virtual machines and the underlying cloud infrastructure services, such as compute, storage, network, and identity and access management. This rapid flow of change can then result in further, live production vulnerabilities through misconfigurations and running malware.

Your Team: Cloud Security, Operations Security, and Development Security

The attacks are multifarious, and the game board is complex, deep, and broad and needs to evolve quickly. That’s what you’re up against; that’s what you’ve got to work with. Now who is on your side? Who is on your team?

On your side are the people mentioned right back in this chapter’s intro,²¹ which are your developers, cloud security engineers, and security operations personnel.

From Code to Cloud: Cloud Security Engineers + Security-Aware Developers + Security Operations

Your team begins with the people most responsible and accountable for security, your cloud security engineers. These are the people that define the rules, capture the policies, and promote the governance.

These are your people right at the pulse of what is needed to develop, build, and operate secure applications in the cloud, but they are not the builders themselves—there are just not that many of these sorts of specialists. These are your folks that can define what “secure” should mean and how it must be dealt with, but they need to work with the rest of your team to make things happen. Specifically, they need to collaborate closely with your security-aware developers and operations people to turn those policies and governance into good ideas, good advice, and real action.

Your Team, Siloed

The cloud native security game requires your team to work closely together, but that’s easier said than done. Your cloud security engineers might be defining all the right policies, your developers thinking they are doing all the right things, and your security operations doing everything they can, but if they’re not collaborating, if they’re not able to communicate with each other, you lose.

Working in silos

The first challenge is that your players don’t speak the same language. The language of security policies and escalation rules is completely different from the language of developer’s code.

When translating, communicating, and connecting those same policies and rules to runtime security, you’ll face the same problem. You could have great players, but if they can’t speak to one another, that can result in silos where everything sounds OK, but nothing is joined up.

Tooling gaps

As your people work in their silos, they look to optimize their own needs. Your developers are aware they need to build secure code, so they look to a collection of possible tools that help with their needs. The same is true of your runtime security operators. In the middle, your security engineers work hard in the space between them to capture their policies and procedures. In a vacuum. Alone. Leaving gaps.

This gap in collaboration often leads to each team picking their own solutions to their own problems, otherwise known as point solutions. Each tooling solution solves a specific problem at one point on the broad attack space of your cloud native security game board. But the people often aren’t working closely together, so the tools are blissfully unaware of each other. Because of the communication and collaboration gaps, there is no perceived need for the tooling to bridge those gaps. Every one of your players is an island, and they’ll work hard to create the best island they can, not realizing that, from an attacker’s perspective, it is the channels and discrepancies between the islands that leave the security doors open.

DevSecOps: Whoever Collaborates Best and Learns Fastest, Wins

Fancy a coffee?” asked Bob from Operations.
“Sure,” said Susie from Development.²²
“Can I come?” said Seetha from Security Operations.
“Sure!” said Bob and Susie in unison.
Fin
(Working title: “DevOps Episode 2: DevSecOps, an Origin Story”)²³

Going back in time a bit, the operability of systems used to be, from the developer’s perspective and as respectfully as possible, “someone else’s problem.” The developer’s job was to align the structure with architectural principles and guardrails, design the specific solution, write the code—to code, that is—and build some deliverables. How those deliverables got to production, how they were operated, and the pains therein were for someone else—operations—to worry about, and so those concerns were often a bit of an afterthought, as shown in Figure 1-4.

Some smart folks then realized that if the development teams and the operations teams worked more closely together, understood each other’s challenges, and empathized with their difficulties, then perhaps higher-quality, more reliable systems could result. If you could break down those silos and bridge those communication gaps, maybe even the speed of confident delivery could increase. It did,²⁴ and the DevOps movement was born.

By breaking down silos so that all the concerns of development and operations could be addressed together, coupled with focussing on delivering value at any moment in time,²⁵ a wellspring of new practices, tools, and technologies were brought to bear. As shown in Figure 1-5, teams became responsible not just for coding and building their changes; they also became party to, and even collectively responsible for, how those systems are deployed and run.

But now someone was missing; an important perspective did not have a seat at the table: security. All these collaboration advantages were great, but shouldn’t have been happening at the cost of systems being insecure. An additional handoff/silo was in the making, as shown in Figure 1-6.

The answer was to change the relationship of security to the entire software development lifecycle (SDLC), so it could be ready for the promise and challenge of the cloud, i.e., its speed of change and increased scale. With this new approach, security becomes a core part of every activity in software development and operations.

This is not security last (DevOpsSec), and not even security before anything (SecDevOps). This approach is DevSecOps: security concerns interleaved into every activity a developer needs to do in order to write, build, and deploy their application and infrastructure code securely. DevSecOps is a culture, a set of practices and a mindset that takes your cloud security engineers’ expert rules, governance, and good advice, and turns it into actionable insights into how the code should be developed, built, and run.

With DevSecOps, your team works to collaborate closely to apply security to the whole cloud native application attack surface. You can dominate the game board, from the first line of code to the moment that code is retired from production, having thankfully completed its secure tour of duty as part of your cloud native applications and infrastructure. It’s this collaboration that helps your people surface security problems early in the actual code and detect and respond to security problems at runtime. It’s this collaboration, and the learning and improvements that can result, that help you win the cloud native security game.

Collaboration and Emergence

When you enable and shore up your people to be able to collaborate better together, there’s more to gain than just filling security gaps. The fascinating scientific phenomenon of emergence comes into play as well.

The analogy to playing a game continues here when you consider your favorite successful sports teams. When you watch or play with a team that is collaborating well together, it feels like you have more players on the pitch than you’re legally supposed to deploy. The things you can do together go so far beyond the level at which you are confident you can play individually that it’s tempting to say that you’re all greater than the sum of your parts.

But you’re not—you’re exactly as good as you all can be together. You are the sum of your parts; it’s just that when the parts are collaborating well, your collective game is so much stronger than it could ever be with individuals each focussing on only their area of the field.

The quality of emergence, or W=p, where the whole (W) is the sum of the parts (p), makes it feel like you have additional players on the pitch because of the connections between you all. The parts include the connections. So it’s not just how good your players are; it’s how good they are at working together.

The higher the quality of those relationships between your security players, the better the overall behavior of the group. You see things quicker, you rally your resources quicker, you decide and act quicker. It is emergence that is the real payback from focussing on how your security group can improve their collaborative connections. In essence, you can play the game better because you can observe, orient, decide, and act (OODA) faster and more proactively than your attackers.

Who OODAs Best, Wins

Thanks to a United States Air Force colonel²⁶ in an airborne dog fight, we know that whoever has the highest quality, and fastest OODA loop, wins.²⁷ Figure 1-7 shows an example of this.

Breaking the process down, from a cloud native security perspective, you want to be able to perform all the OODA components:

Observe: Observe your security across the complex breadth and depth of your systems and processes. This means you’ll gather as much knowledge as possible from as many places and people as you can across the breadth and depth of your software lifecycle to help quickly describe the current situation. Then, build a picture of what is going on, who is doing what, and what compound effects might be at play.
Orient: Glue everyone together so that they can understand their role in the security situation. Bridge to everyone’s unique perspectives and understanding of your systems so they can know what they need in order to do what comes next.
Decide: Figure out what needs to be done, everywhere from the lines of code being written to the runtime systems that may be under attack.²⁸
Act: Take the right action in every area, from a joined-up perspective, to proactively and reactively protect your systems from threats or vulnerabilities.

What works for Top Gun also works for you and your security personnel. The better your collaboration, the more positive behaviors can emerge, the better your OODA loops can be, and, finally, the faster you and your team can learn and adapt to new security challenges as they arise.

Your CNAPP Enables Your Cloud Native Security OODA Loop

Establishing, supporting, and facilitating your whole-team security OODA loops, in the face of cloud native complexity, scale, and speed of change, is the core of what you should expect your CNAPP to do. Features from different vendors will vary, but what you need from your CNAPP will not: enable your development, security engineering, and security operations teams to work effectively together, so you can establish fast security OODA loops across the whole length and breadth of your software development and delivery lifecycle, as shown in Figure 1-8.

Throughout the various contexts and scenarios in the rest of this book, you’ll see how your collaboration, supported by your CNAPP, can speed up your own security OODA loops.

A CNAPP ties your cloud native security room together. From securing your development work in Chapter 4, securing your supply chain and build and deployment pipelines in Chapters 5 and 6, collaborating across all teams to coordinate runtime security in Chapter 7, to closing out with how you all, collectively, can improve your game in Chapter 8, in the coming chapters you’ll see how a CNAPP supports the necessary collective collaboration and emergence that helps your teams win.

Well, how your teams maybe can win, but first, back to our story from the beginning of this chapter, where we were, sadly, not.

Note

Why a Cloud Native Application Protection “Platform”? A platform’s job is subtly different from a specific tool for a specific job. Whereas a tool extends the capabilities of a person in a narrow, focussed way, a platform extends the capabilities of groups of people, lifting their abilities up to a new level. A tool supports an individual; a platform lifts up a community.

A CNAPP is a platform because its remit is not to just provide one small function to help one or a few people do their job, but to enable collaboration on security across all the teams involved in your cloud native application development ecosystem.

Losing Our Cloud Native Security Game

Our OODA game was extremely poor. We had great tools, great technologies, and great people, but we’d been embarrassed. The question was, how?

An answer was in how we worked together, in that we didn’t. As a cloud native security team, our policies and governance were captured in a mixture of places, largely ignored by the multiple stakeholders within our organization. The developers were doing everything they thought they needed to do, but without much awareness of our insights, and the operational security people were even less aware. We knew of each other, even liked one another, but we didn’t talk each other’s language. We didn’t understand the challenges of developing secure code or log and alert scouring, and they didn’t understand our high-level guidance and dictates.

We’d not played together, so we’d lost together. We were a collection of star players with no shared strategy, no shared game plan, no shared tactics, not even a shared game board… We had no shared context; in our silos, we were perfect, but under the harsh glow of a real team at MI5, we’d been shown to be the misfits we were.

It was time to turn that around, to build that shared context, to speak each other’s languages, to combine our toolsets to plug the gaps and show where things had slipped through the holes.

It was time to collaborate to win.

¹ The UK’s Security Service

² MI5 Headquarters

³ Or, worse, what security policies we might be missing. That was much worse; that would be our fault!

⁴ We’re paraphrasing slightly. Fred Brooks, in The Mythical Man-Month: Essays on Software Engineering (Addison-Wesley), emphasizes the point that more people won’t speed up the delivery of a big IT project. They’ll more likely slow it down as the lines of communication and collaboration become a nest of difficulty. Melvin Conway’s law also points out the way a solution will mirror the organization of people to your advantage or, at scale, often to your pain and angst.

⁵ Murphy’s law states that “what can go wrong, will go wrong.” When you’re fighting for reliability, it helps to know that systems will, and do, fail. And the larger-scale and more complicated they are, the more likely it is that Murphy’s law will rear its head. Techniques such as chaos and resilience engineering exist to help you be better prepared for, and learn faster from, when, not if, your systems encounter this law. See Learning Chaos Engineering: Discovering and Overcoming System Weaknesses through Experimentation by Russ Miles (O’Reilly, 2019).

⁶ Someone’s “habits of working”

⁷ See Security Chaos Engineering: Sustaining Resilience in Software and Systems by Kelly Shortridge with Aaron Rinehart (O’Reilly, 2023) for more on proactive security experimentation on your own terms, on your own systems.

⁸ See Container Security: Fundamental Technology Concepts that Protect Containerized Applications by Liz Rice (O’Reilly, 2020).

⁹ This is further discussed in “The Attacker’s Moves”. To whet your appetite even further, and broaden your attack-resistance techniques and tactics, it’s worth checking out the MITRE ATT&CK knowledge base.

¹⁰ This is often referred to as the 4 Cs, ignoring the fifth angle of attack on the continuous integration and delivery (CI/CD) pipelines themselves.

¹¹ To give you some real nightmares, unless you specifically change the default user when you’re building your container image, the default user in an image will be root when it becomes a running container. How does that sound for handing over the keys to the car?

¹² One way secrets, even those in private repositories, can leak out is through the pipeline of scripts that build your systems; see Carly Page’s TechCrunch article, “CircleCI Warns Customers to Rotate ‘Any and All Secrets’ after Hack”.

¹³ Remember in the preface where we brought up what this book does not contain? Don’t worry, most folks don’t read the preface. The main point we made there is that this book keeps the focus on what a CNAPP adds to your security capability, with more depth being signposted where we can.

¹⁴ “A zero-day (also known as a 0-day) is a vulnerability or security hole in a computer system unknown to its owners, developers or anyone capable of mitigating it.”—Wikipedia

¹⁵ When considering a system or some other context, it is helpful to understand the nature of that context. Whether the context is clear, complicated, or complex will drive how it can be made sense of and worked with. Wikipedia has an entry on the Cynefin framework, which discusses these distinctions.

¹⁶ The importance of an SBOM that is as deep as possible is explored in Chapter 5, “Securing Your Supply Chain”.

¹⁷ You’ll be diving into exploring dependency trees for vulnerabilities in Chapter 5: Securing Your Supply Chain.

¹⁸ Log4j, anyone? Log4j has achieved a level of notoriety as being one of the more prominent sources of a critical vulnerability in recent times.

¹⁹ This, of course, also harks back to the age-old problem of projects and companies using open source without contributing back features or fixes, and so not having any say in priorities for their critical application dependencies.

²⁰ Or network DMZ.

²¹ You know, the bit where we were surprised by MI5 getting in touch…

²² The names have been changed to protect the wonderful.

²³ Episode 1 is featured in Digitalization of Financial Services in the Age of Cloud by Jamil Mina et al. (O’Reilly, 2023).

²⁴ See Accelerate: Building and Scaling High Performing Technology Organizations by Nicole Forsgren, Jez Humble, and Gene Kim (IT Revolution Press, 2018).

²⁵ See Continuous Delivery: Reliable Software Releases through Build, Test, and Deployment Automation by Dave Farley and Jez Humble (Addison-Wesley Professional, 2010).

²⁶ John R. Boyd’s OODA loop has been applied to everything from incident response to business and technology strategy.

²⁷ Shamelessly making reference to the motto of the British Army’s Special Air Service, the UK Special Forces, and many other special forces groups around the world: “Who Dares Wins”.

²⁸ Or ripe for the attacking…

Get Cloud Native Application Protection Platforms now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.

Start your free trial

Cloud Native Application Protection Platforms by Russ Miles, Stephen Giguere, Taylor Smith

Chapter 1. Cloud Security, the Collaborative Game

The Cloud Native Security Game

How a Play Is Made: The Anatomy of an Attack

Meet the Attackers: Actors and Vectors

Note

The Attacker’s Moves

Gaining initial access

Establishing the foothold

Escalating privilege

Executing the attack

Broad, Deep, and Complex: The Cloud Native Security Game Board

First, a Pinch of Structure: The Cloud Native Stack

Second, a Smattering of Speed: Lifecycles

Note

To Season, Add Some Open Source

Open Source: Easy Button for Growth, but at What Risk?

Your (Insecure) Dish Is Ready: From Shallow to Defense in Depth

Note

The Attack Surface Is Broad

Figure 1-3. Opportunities for security compromise across the entire cloud native software development lifecycle

Your Team: Cloud Security, Operations Security, and Development Security

From Code to Cloud: Cloud Security Engineers + Security-Aware Developers + Security Operations

Your Team, Siloed

Working in silos

Tooling gaps

DevSecOps: Whoever Collaborates Best and Learns Fastest, Wins

Figure 1-4. Handoff between development and operations and the lack of a learning and improvement feedback loop between code/build and deploy/run

Figure 1-5. DevOps broke down the barriers, enabling the possibility of continuous, confident delivery, and opened up multiple clear improvement feedback loops

Figure 1-6. In a continuously delivered world, when the stream of change and feedback is fast, where does security fit in?

Collaboration and Emergence

Who OODAs Best, Wins

Figure 1-7. The OODA loop

Your CNAPP Enables Your Cloud Native Security OODA Loop

Figure 1-8. Your CNAPP connects all your teams to the entire cloud native game board

Note

Losing Our Cloud Native Security Game

Don’t leave empty-handed

It’s yours, free.

Check it out now on O’Reilly