SA-301f · Module 2

Load Balancing Strategies

3 min read

Load balancing distributes requests across service instances to prevent any single instance from becoming a bottleneck. The balancing strategy determines how effectively the load is distributed, how quickly failed instances are removed from rotation, and how the system responds to varying request weights. The strategy matters more than the tool — a poorly configured load balancer makes a healthy system appear unhealthy.

Round Robin Requests are distributed sequentially across instances. Simple, predictable, and effective when all requests have similar processing costs. Fails when requests vary widely in cost — a round-robin that sends a 100ms request and a 10-second request to alternating instances creates an unbalanced load despite an even request distribution.
Least Connections Requests are routed to the instance with the fewest active connections. This naturally balances load when request processing times vary — the instance that finishes faster receives the next request sooner. More effective than round robin for APIs with variable response times.
Weighted and Adaptive Instances are weighted by capacity — a large instance receives proportionally more traffic than a small one. Adaptive balancing adjusts weights based on real-time metrics: response latency, error rate, and CPU utilization. The load balancer shifts traffic away from degraded instances before they fail. Adaptive balancing is the most sophisticated strategy and the most operationally demanding to tune.