A Load Balancer is a critical infrastructure component that distributes incoming network traffic across multiple servers to ensure no single server becomes overwhelmed.
Loading simulation...
The core idea is straightforward: instead of routing all traffic to one server (which would eventually crash under heavy load), a load balancer acts as a traffic cop, intelligently spreading requests across a pool of healthy servers. This improves application availability, responsiveness, and overall system reliability.
Popular Examples: AWS Elastic Load Balancer (ELB), NGINX, HAProxy
In this chapter, we will explore the high-level design of a Load Balancer.
Load balancers appear in almost every distributed system architecture. Understanding how to design one from scratch demonstrates deep knowledge of networking, high availability, and system scalability.
Lets start by clarifying the requirements.
Before starting the design, it's important to ask thoughtful questions to uncover hidden assumptions, clarify ambiguities, and define the system's scope more precisely.
Here is an example of how a discussion between the candidate and the interviewer might unfold:
Candidate: "What type of traffic should the load balancer handle? Are we focusing on HTTP/HTTPS traffic, or should it support any TCP/UDP traffic?"
Interviewer: "Let's design a general-purpose load balancer that supports both Layer 4 (TCP/UDP) and Layer 7 (HTTP/HTTPS) load balancing."
Candidate: "What is the expected scale? How many requests per second should the system handle?"
Interviewer: "The load balancer should handle up to 1 million requests per second at peak traffic."
Candidate: "How should we handle server failures? Should the load balancer automatically detect and route around unhealthy servers?"
Interviewer: "Yes, health checking is critical. The system should detect failures within seconds and stop routing traffic to unhealthy servers."
Candidate: "Do we need to support session persistence, where requests from the same client go to the same backend server?"
Interviewer: "Yes, sticky sessions should be supported for stateful applications, but it should be configurable."
Candidate: "What about SSL/TLS termination? Should the load balancer handle encryption?"
Interviewer: "Yes, SSL termination at the load balancer is required to offload encryption work from backend servers."
Candidate: "What availability target should we aim for?"
Interviewer: "The load balancer itself must be highly available with 99.99% uptime, since it's on the critical path for all traffic."
After gathering the details, we can summarize the key system requirements.
To understand the scale of our system, let's make some reasonable assumptions.
1M RPS × 2 KB = 2 GB/s1M RPS × 10 KB = 10 GB/s1,000 servers × (1/5) = 200 health checks/secondEach connection requires state tracking:
500,000 × 500 bytes = 250 MBThese numbers indicate we need a system capable of handling massive throughput with minimal memory overhead per connection.
A load balancer exposes both a data plane (handling actual traffic) and a control plane (configuration and management). Below are the core APIs.
POST /api/v1/backendsAdds a new backend server to the load balancer pool.
/health).400 Bad Request: Invalid address or port.409 Conflict: Backend already registered.DELETE /api/v1/backends/{backend_id}Removes a backend server from the pool. Existing connections are gracefully drained.
200 OK: Backend removed successfully.404 Not Found: Backend ID does not exist.GET /api/v1/backends/{backend_id}/healthReturns the current health status and metrics for a specific backend.
PUT /api/v1/config/algorithmSets the load balancing algorithm for traffic distribution.
round_robin, weighted_round_robin, least_connections, ip_hash, random.400 Bad Request: Invalid algorithm name.At a high level, our load balancer must satisfy these core requirements:
The architecture can be broken down into a data plane (handles actual traffic) and a control plane (manages configuration and health).
Instead of presenting the full architecture at once, we'll build it incrementally by addressing one requirement at a time.
The primary function is accepting client connections and forwarding them to backend servers.
The entry point for all client traffic. It accepts incoming connections on configured ports (e.g., 80 for HTTP, 443 for HTTPS).
Responsibilities:
The brain of the load balancer. It decides which backend server should handle each request.
Responsibilities:
A logical group of backend servers that can handle the same type of requests.
Responsibilities:
Without health checks, the load balancer would continue sending traffic to crashed or overloaded servers, causing user-facing errors.
A background service that continuously monitors the health of all backend servers.
Responsibilities:
| Type | How It Works | Use Case |
|---|---|---|
| TCP Check | Attempts TCP connection | Basic connectivity |
| HTTP Check | Sends HTTP request, expects 2xx | Web applications |
| Custom Script | Runs user-defined check | Complex health logic |
A single load balancer is a single point of failure. If it crashes, all traffic stops.
Deploy multiple load balancer instances that can handle traffic independently.
A floating IP address that can be moved between LB nodes. Clients connect to the VIP, not individual LB IPs.
Coordinates which LB node is active and handles failover when the primary fails.
Pros: Simple, no state synchronization needed. Cons: Standby resources are wasted during normal operation.
Pros: Better resource utilization, higher throughput. Cons: Requires state synchronization for sticky sessions.
Here is the complete architecture combining all requirements:
| Component | Purpose |
|---|---|
| Virtual IP / DNS | Single entry point for clients |
| LB Nodes | Accept and route traffic to backends |
| Session Store | Shared state for sticky sessions (Redis) |
| Health Checker | Monitor backend health |
| Config Manager | Manage LB configuration |
| Backend Pool | Group of application servers |
A load balancer is primarily an in-memory, real-time system. It does not typically use a traditional database for the data plane. However, the control plane needs persistent storage for configuration.
| Data Type | Storage | Reason |
|---|---|---|
| Active connections | In-memory (LB node) | Ultra-low latency required |
| Backend server list | In-memory + Config store | Fast lookups, persistent config |
| Health status | In-memory | Changes frequently, needs sub-second access |
| Session mappings | Redis/Memcached | Shared across LB nodes |
| Configuration | etcd/Consul/PostgreSQL | Persistent, versioned |
| Metrics/Logs | Time-series DB (InfluxDB, Prometheus) | Historical analysis |
| Field | Type | Description |
|---|---|---|
backend_id | String (PK) | Unique identifier |
pool_id | String (FK) | Backend pool this server belongs to |
address | String | IP address or hostname |
port | Integer | Port number |
weight | Integer | Weight for weighted algorithms |
max_connections | Integer | Connection limit |
enabled | Boolean | Whether backend is enabled |
created_at | Timestamp | Creation time |
| Field | Type | Description |
|---|---|---|
pool_id | String (PK) | Unique identifier |
name | String | Human-readable name |
algorithm | Enum | Load balancing algorithm |
health_check_path | String | HTTP path for health checks |
health_check_interval | Integer | Seconds between checks |
sticky_sessions | Boolean | Enable session persistence |
sticky_ttl | Integer | Session cookie TTL |
Now that we have the high-level architecture in place, let's dive deeper into some critical design choices.
The choice of load balancing algorithm significantly impacts traffic distribution, backend utilization, and overall system performance.
A good algorithm should:
Let's explore the primary approaches.
The simplest algorithm. Requests are distributed sequentially across all available backends.
Maintain a counter that increments with each request. Select the backend at index counter % number_of_backends.
Best For: Homogeneous backends with similar capacity and stateless requests.
An extension of round robin that accounts for different server capacities.
Each backend is assigned a weight proportional to its capacity. Servers with higher weights receive more requests.
Maintain a weighted list or use algorithms like Smooth Weighted Round Robin to avoid bursts to high-weight servers.
Best For: Heterogeneous server pools with known capacity differences.
Routes each new request to the backend with the fewest active connections.
Track the number of active connections per backend. When a new request arrives, select the backend with the minimum count.
Best For: Workloads with varying request processing times.
Combines least connections with server weights.
Select the backend with the lowest ratio of active_connections / weight.
Best For: Production environments with heterogeneous servers.
Routes requests based on a hash of the client's IP address.
The same client IP always routes to the same backend (assuming the backend pool doesn't change).
Best For: Simple session persistence without application-level changes.
An advanced form of hashing that minimizes disruption when backends are added or removed.
Best For: Systems where backends frequently scale up/down.
| Algorithm | Pros | Cons | Best For |
|---|---|---|---|
| Round Robin | Simple, stateless | Ignores capacity | Homogeneous backends |
| Weighted Round Robin | Respects capacity | Static weights | Known capacity differences |
| Least Connections | Adapts to load | Requires state | Variable request times |
| Weighted Least Conn | Optimal utilization | Complex | Production environments |
| IP Hash | Simple stickiness | Uneven distribution | Basic session persistence |
| Consistent Hash | Minimal disruption | Complex setup | Dynamic scaling |
Recommendation: For most production systems, Weighted Least Connections provides the best balance of adaptability and efficiency. Use Consistent Hashing when backends scale frequently or for cache-aware routing.
Health checks are the foundation of reliable load balancing. Without proper health monitoring, the load balancer would continue routing traffic to failed servers.
| Parameter | Description | Typical Value |
|---|---|---|
| Interval | Time between checks | 5-10 seconds |
| Timeout | Max wait for response | 2-3 seconds |
| Healthy Threshold | Consecutive passes to mark healthy | 2-3 |
| Unhealthy Threshold | Consecutive failures to mark unhealthy | 2-3 |
Simply attempts to establish a TCP connection.
Pros: Simple, low overhead, works for any TCP service. Cons: Doesn't verify application is actually working.
Sends an HTTP request and validates the response.
Pros: Verifies application is responding correctly. Cons: Higher overhead, requires health endpoint.
Runs a user-defined script or command.
Pros: Maximum flexibility for complex checks. Cons: Higher complexity and security considerations.
When a backend fails health checks:
Some applications require that all requests from a client go to the same backend server, typically when session state is stored locally on the server.
Without stickiness:
The load balancer injects a cookie containing the backend identifier.
Pros:
Cons:
Route based on client IP address (similar to IP Hash).
Pros:
Cons:
Move session state out of the backend entirely.
Pros:
Cons:
Best Practice: Design applications to be stateless and store session data in a shared store (Redis, Memcached). This eliminates the need for sticky sessions and improves scalability.
Use cookie-based stickiness only for legacy applications that cannot be modified.
Load balancers operate at different layers of the OSI model, each with distinct capabilities and trade-offs.
Routes based on IP address and TCP/UDP port only. Does not inspect packet contents.
| Aspect | Layer 4 |
|---|---|
| Speed | Very fast (hardware-accelerated possible) |
| Intelligence | Limited (no content awareness) |
| Use Cases | Any TCP/UDP traffic |
| SSL Handling | Pass-through only |
| Sticky Sessions | IP-based only |
Inspects HTTP headers, URLs, cookies, and content to make routing decisions.
| Aspect | Layer 7 |
|---|---|
| Speed | Slower (content parsing) |
| Intelligence | High (content-based routing) |
| Use Cases | HTTP/HTTPS traffic |
| SSL Handling | Termination + inspection |
| Sticky Sessions | Cookie, header, or URL-based |
| Scenario | Recommended |
|---|---|
| Raw TCP traffic (databases, custom protocols) | Layer 4 |
| Maximum performance, simple routing | Layer 4 |
| Content-based routing | Layer 7 |
| SSL termination | Layer 7 |
| Cookie-based sticky sessions | Layer 7 |
| HTTP header inspection/modification | Layer 7 |
SSL termination means the load balancer handles encryption/decryption, so backend servers communicate in plain HTTP.
Traffic between LB and backends is unencrypted. Options:
Since the load balancer sits on the critical path, its failure means complete service outage. Designing for high availability is essential.
Load balancer nodes monitor each other using:
Industry standard for IP failover.
Failover time: 1-3 seconds
Multiple LB IPs registered in DNS with health checks.
Failover time: DNS TTL (can be slow)
Multiple LB nodes share the same IP address. BGP routing directs traffic to the nearest healthy node.
Failover time: Seconds (BGP convergence)
| Approach | Connection Handling | Complexity |
|---|---|---|
| Stateless | Active connections dropped on failover | Simple |
| Stateful | Connections migrated to backup | Complex (requires state sync) |
Recommendation: Design for stateless failover. Modern applications handle connection drops gracefully with retries.
What is the primary purpose of a load balancer in a distributed system?