Chapter 4. Load Balancing for Auto Scaling

The purpose of auto scaling is to increase or decrease the number of virtual machines or containers as dictated by the scaling policy. A scaling policy can be triggered by many events, such as CloudWatch metric alarms, a schedule, or anything that is able to make an API call. This enables your application to scale its capacity based on real-time demand or planned usage. As capacity is increased or reduced, however, the nodes being added or removed must be registered or deregistered with a load balancing solution. For auto scaling to function properly all aspects of this process must be automated. In this chapter you will learn what to consider when load balancing over auto scaling applications, plus what approaches you may take to address these considerations.

Load Balancing Considerations for Auto Scaling Groups

As you’ve already learned, auto scaling implies that nodes are automatically created or removed. This action may be as a result of utilization metrics or schedules. Machines or containers being added will do nothing to serve more load, unless the entity that is feeding them load is in some way notified. When machines or containers are removed due to an auto scaling event, if those nodes are not deregistered, then the load balancer will continue to try to direct traffic to them. When adding and removing machines from a load balancer, it’s also important to consider how the load is being distributed and if session persistence is being used.

Adding Nodes

When a node is added to the application’s capacity it needs to register with the load balancing solution or no traffic will be pushed to this new capacity. It may seem that adding capacity and not using it may be harmless, but there are many adverse effects it may have. Cost is the first, don’t pay for things you’re not using. The second has to do with your metrics. It is common to use a statistical average of CPU utilization to determine if capacity needs to be added or removed. When using an average, it is assumed that the load is balanced among the machines. When load is not properly balanced this assumption can cause issues. The statistic may bounce from a high average (adding capacity) to a low average (removing capacity). This is known as rubber banding; it takes action to serve demand but does not actually provide the intended effect. 

Deregistering Nodes

When auto scaling, you must also consider nodes deregistering from the load balancer. You need nodes to deregister from your load balancer whether they’re actively health checking or not. If a node suddenly becomes unavailable your clients will experience timeouts, or worse session loss if your application depends on session persistence. To cleanly remove a node from an application pool it must first drain connections and persistent sessions then deregister.

Algorithms

The load balancing algorithm being used by your solution should also be considered. It’s important to understand how adding or removing a node from the pool will redistribute load. Algorithms such as round robin aim to distribute load evenly based on a given metric; round robin balances based on the request sum metric. Adding and removing nodes when using an algorithm like this will have little impact on the distribution. In algorithms that distribute load by using a hash table, it’s possible that similar requests will be direct to the same server. In a static environment this type of algorithm is sometimes used to provide session persistence, however, this can not be relied on in an auto scaling environment. When adding or removing nodes from a load balancer using a hashing algorithm, the load may redistribute, and requests will be directed to a different server than before the capacity change.

When load balancing over Auto Scaling Groups you have a number of things to consider. The most important of these being how machines are registered and deregistered from the load balancing solution. A close second being the impact on session persistence, and how your load distribution will change.

Approaches to Load Balancing Auto Scaling Groups

The considerations outlined in the previous section can be approached with a bit of automation and foresight. Most of the considerations that are specific to the auto scaling pattern have to do with automatic registration and deregistration with the load balancer.

Proactive Approach

The best way to approach the session persistence issue is to move your session. To load balance appropriately, your client should be able to hit any node in your application tier without issue. You should store session state in an in-memory database, such as Redis or Memcached. Giving all application servers access to centralized, shared memory, you no longer need session persistence. If moving the session is not possible, a good software or ported load balancer will allow you to properly handle sessions.

An automatic registration and deregistration process is the best approach for load balancing over auto scaling tiers. When using Auto Scaling Groups, or Elastic Container Service (ECS) Service Tasks, there is an attribute of those resources that takes a list of ELBs, or Target Groups for use in ALB/NLBs. When you provide an ELB or Target Group, the Auto Scaling Group, or Container Service will automatically register and deregister with the load balancer. AWS native load balancers can drain connections but do not drain sessions.

When using something other than a cloud-provided load balancer you will need to create some sort of notification hook to notify the load balancer. In AWS there are three ways to do this: the node transitioning states makes a call to the load balancer; the load balancer queries the AWS API regularly; or a third-party integration is involved. The load balancer being used must be able to register or deregister nodes through automation. Many load balancing solutions will offer an API of some sort to enable this approach. If an API is not available, a seamless reload and templated configurations will work as well.

The best way for an Instance in an Auto Scaling Group to register or deregister from a load balancer as it comes up or prepares to go down is through Lifecycle Hooks. The Lifecycle Hook is a feature of AWS Auto Scaling Groups. This feature creates a hook between the Auto Scaling Groups’ processes and the OS layer of the application server by allowing the server to run arbitrary code on the instance for different transitioning states. On launch the Auto Scaling Group can signal a script to be run that will make a call to the load balancer to register it. Before the Auto Scaling Group terminates the instance, the lifecycle hook should run a script that instructs the load balancer to stop passing it new traffic, and optionally wait for connections and sessions to drain before being terminated. This is a proactive approach that enables you to ensure all of your client connections and sessions are drained before the node is terminated.

Reactive Approaches

You can also use a reactive approach by having your load balancer query the AWS API and update the load balancer as nodes come online or are removed. This approach is reactive because the load balancer is updated asynchronously after the node is alive or already gone. A reactive approach falls short because the load balancer will inevitably experience timeouts before health checks deem the missing node unhealthy, and does not take session persistence into account. There are load balancing features that will gracefully handle upstream timeouts and proxy the request to the next available upstream node; this is necessary to prevent your clients from seeing HTTP errors. Reactive approaches can also be done via Simple Notification Service (SNS), notifications triggered by the Auto Scaling Group or the ECS Service. These notification are able to trigger arbitrary code to be run by AWS Lambda Functions or an API that controls registration and deregistration.

At this time the AWS native load balancers only support round-robin load balancing algorithms. If you’re using a load balancer that supports other algorithms based on the pool, such as a hashing algorithm, you should consider if rebalancing is of concern. If redistributing the hash and rebalancing will cause problems with your application, your load balancer will need a feature that will intelligently minimize the redistribution. If these features are not available you will need to evaluate if session persistence and connection draining will better fit your needs.

Now that you understand the considerations of auto scaling, and some insight on how to approach these considerations, you can build your own auto scaling tier in AWS.

Get Load Balancing in the Cloud now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.