Chapter 9. Scaling

In Kubernetes, scaling can mean different things to different users. We distinguish between two cases:

Cluster scaling

Sometimes called cluster elasticity, this refers to the (automated) process of adding or removing worker nodes based on cluster utilization.

Application-level scaling

Sometimes called pod scaling, this refers to the (automated) process of manipulating pod characteristics based on a variety of metrics, from low-level signals such as CPU utilization to higher-level ones, such as HTTP requests served per second, for a given pod.

Two kinds of pod-level scalers exist:

Horizontal pod autoscalers (HPAs)

HPAs automatically increase or decrease the number of pod replicas depending on certain metrics.

Vertical pod autoscalers (VPAs)

VPAs automatically increase or decrease the resource requirements of containers running in a pod.

In this chapter, we first examine cluster elasticity for GKE, AKS, and EKS and then discuss pod scaling with HPAs.

9.1 Scaling a Deployment

Problem

You have a deployment and want to scale it horizontally.

Solution

Use the kubectl scale command to scale out a deployment.

Let’s reuse the fancyapp deployment from Recipe 4.5, with five replicas. If it’s not running yet, create it with kubectl apply -f fancyapp.yaml.

Now suppose that the load has decreased and you don’t need five replicas anymore; three is enough. To scale the deployment down to three replicas, do this:

$ kubectl get deploy fancyapp NAME READY UP-TO-DATE ...

Get Kubernetes Cookbook, 2nd Edition now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.