Chapter 5: Elastic Load Balancing, Amazon CloudWatch, and Auto Scaling

Chapter 5: Elastic Load Balancing, Amazon CloudWatch, and Auto Scaling




Photo: Elastic Man: Man Sets Stretchiest Skin Record - I just hope it goes back to normal afterwards. i.e.

Elastic: capable of returning to its original length, shape, etc., after being stretched, deformed, compressed, or expanded.

Chapter 5 introduces some of the more commonly known AWS services for use with IaaS/EC2: Elastic Load Balancing (ELB) and Auto Scaling (AS).

ELB is a managed load balancing service that works across EC2 instances in 1 or more zones. It's called "Elastic" because it auto-scales as the load goes up and down, and conducts health checks on EC2 instances and ensures you have the same number running (ok, that's not really Elastic I guess more available).  The book doesn't mention this, but there are 2 types now, Classic and Application LB (which does Content Based Routing): https://aws.amazon.com/elasticloadbalancing/

There's some detail to learn (see docs).
I wondered how hard it is to set up load balancing across regions? Turns out AWS has already thought of this and you can use Route 53 DNS failover with 2 or more ELBs in different regions.  This also supports latency based routing. 

Why would you do this? To optimise for customer location, but also in case of failures in a single AWS region. Can this happen? Maybe, in theory only a single availability zone should fail at once but there can be domino effects to other AZs in the same region, here's an article about the Sydney failure in 2016: https://www.itnews.com.au/news/aws-sydney-outage-prompts-architecture-rethink-420506

As usual I wonder how best to architect for cost and limitations?
What are the ELB limitations? One reported limit is the maximum default 20KB/s throughput (although I'm not sure where this is documented?)  Some people report that if you are expecting a higher peak load then you should pre-warm the ELB. How do you do this? With a blow torch? Evidently you have to raise a support request (how far in advance?) and supply details including:

ELB Name
Start date for elevated traffic patterns
End date for elevated traffic patterns
Traffic delta OR request rate expected at surge(in Requests Per Second)
Average amount of data passing through the ELB per request/response pair(In Bytes)
Rate of traffic increase
Are keep-alives used on the back-end?
Percent of traffic using SSL termination on the ELB
Number of AZ's that will be used for this event/load balancer
Is the back-end scaled to event/spike levels? If no, how many and what type of instances and when will they be scaled?
Use-case description
Traffic pattern description

Other issues (some with workarounds) include imperfect load balancing when lots of traffic comes from one IP address (e.g. load testing?), and default 60s connection timeout period.


This blog explains why Loggly chose Amazon Route 53 over ELB:

Amazon has some helpful blogs on ELB capacity issues: https://aws.amazon.com/premiumsupport/knowledge-center/elb-capacity-troubleshooting/

Also the fact that limitations with other resources are likely to bite before the ELB itself. E.g

You can safely assume that ELB's capacity is unlimited. It's likely that your budget for bandwidth and instances to handle the load will be exhausted before you hit any ELB limits.

Cross-zone load balancing

Another trick appears to be to use cross-zone load balancing. By default the load is spread across the availability zones (and assumes there are equal numbers of EC2 instances in each, if there isn't this can cause problems). Enabling cross-zone load balancing results in load being distributed across all the EC2 instances equally (irrespective of zone):

How does the routing really work? According to the documents: 

With a Classic Load Balancer,

the load balancer node that receives the request selects a registered instance using the round robin routing algorithm for TCP listeners and

the least outstanding requests routing algorithm for HTTP and HTTPS listeners.

With an Application Load Balancer, the load balancer node that receives the request evaluates the listener rules in priority order to determine which rule to apply, and then selects a target from the target group for the rule action using the round robin routing algorithm. Routing is performed independently for each target group, even when a target is registered with multiple target groups.


What's the "least outstanding requests algorithm"?  I'm guessing the ELB keeps track of how many requests are current for each instance and picks the one with the smallest number next: Someone explained it as:

The Load balancer node sends the request to healthy instances within the same Availability Zone using the leastconns routing algorithm. The leastconns routing algorithm favors back-end instances with the fewest connections or outstanding requests.


Before a client sends a request to your load balancer, it first resolves the load balancer's domain name with the Domain Name System (DNS) servers. The DNS server uses DNS round robin to determine which load balancer node in a specific Availability Zone will receive the request.
The selected load balancer node then sends the request to healthy instances within the same Availability Zone. To determine the healthy instances, the load balancer node uses either the round robin (for TCP connections) or the least outstanding request (for HTTP/HTTPS connections) routing algorithm. The least outstanding request routing algorithm favors back-end instances with the fewest connections or outstanding requests.

How well does this type of load balancing work? In theory, and assuming that the time taken per connection is 100% cpu on only the instance directly connected to (not spent in calls to other dependent subsystems) then the number of connections is an accurate proxy for the cpu load on the instance (as Service Demand is just Throughput times Time from Little's Law). However, if any of these assumptions isn't correct then imbalance is likely. In practice 100% CPU is unlikely, and network, I/O, wait, synch, and suspension times may dominate the transaction time, and time may be spent in calls to other systems (e.g. databases, other AWS services etc). For example:


Finally, other things to think about include connection draining (sounds mucky), proxy protocols, sticky sessions (sounds messy, and results in calls from the same client going to the same instance which may impact load balancing?), and health checks.


Draining (a canal not a swamp)





PS
The AWS Elastic Load Balancer (Classic) API has 28 operations and 25+ data types.

Comments

  1. Nice article, users are attracted when they see your post thanks for posting keep updating AWS Online Course Hyderabad

    ReplyDelete
  2. This is a nice article here with some useful tips for those who are not used-to comment that frequently. Thanks for this helpful information I agree with all points you have given to us. I will follow all of them.
    AWS training in chennai

    ReplyDelete
  3. Infycle Technologies, the No.1 software training institute in Chennai offers the No.1 Selenium course in Chennai for tech professionals, freshers, and students at the best offers. In addition to the Selenium, other in-demand courses such as Python, Big Data, Oracle, Java, Python, Power BI, Digital Marketing, Cyber Security also will be trained with hands-on practical classes. After the completion of training, the trainees will be sent for placement interviews in the top companies. Call 7504633633 to get more info and a free demo.

    ReplyDelete

Post a Comment

Popular posts from this blog

Which Amazon Web Services are Interoperable?

AWS Certification glossary quiz: IAM

AWS SWF vs Lambda + step functions? Simple answer is use Lambda for all new applications.