AlgoMaster Logo

Design Load Balancer

Ashish

Ashish Pratap Singh

In this chapter, we will explore the high-level design of a Load Balancer.

Load balancers appear in almost every distributed system architecture. Understanding how to design one from scratch demonstrates deep knowledge of networking, high availability, and system scalability.

Lets start by clarifying the requirements.

1. Clarifying Requirements

Before starting the design, it's important to ask thoughtful questions to uncover hidden assumptions, clarify ambiguities, and define the system's scope more precisely.

Here is an example of how a discussion between the candidate and the interviewer might unfold:

After gathering the details, we can summarize the key system requirements.

1.1 Functional Requirements

  • Traffic Distribution: Distribute incoming requests across multiple backend servers using configurable algorithms.
  • Health Checking: Continuously monitor backend servers and automatically remove unhealthy ones from the pool.
  • Session Persistence: Support sticky sessions to route requests from the same client to the same server.
  • SSL Termination: Handle SSL/TLS encryption and decryption to offload work from backend servers.
  • Layer 4 and Layer 7 Support: Support both transport-level (TCP/UDP) and application-level (HTTP/HTTPS) load balancing.

1.2 Non-Functional Requirements

  • High Availability: The load balancer must be highly available (99.99% uptime) with no single point of failure.
  • Low Latency: Should add minimal latency to requests (< 1ms overhead).
  • High Throughput: Handle up to 1 million requests per second at peak.
  • Scalability: Should scale horizontally to handle increasing traffic.
  • Fault Tolerance: Continue operating even when individual components fail.

2. Back-of-the-Envelope Estimation

To understand the scale of our system, let's make some reasonable assumptions.

Traffic

  • Peak requests: 1 million RPS (requests per second)
  • Average requests: ~300,000 RPS (assuming 3x peak factor)
  • Concurrent connections: ~500,000 (assuming average connection duration of 500ms)

Bandwidth

  • Average request size: ~2 KB (headers + small payload)
  • Average response size: ~10 KB
  • Ingress bandwidth: 1M RPS × 2 KB = 2 GB/s
  • Egress bandwidth: 1M RPS × 10 KB = 10 GB/s

Health Checks

  • Backend servers: 1,000 servers across multiple data centers
  • Health check interval: 5 seconds
  • Health check traffic: 1,000 servers × (1/5) = 200 health checks/second

Connection Table

Each connection requires state tracking:

  • Per-connection memory: ~500 bytes (source IP, port, destination, timestamps, etc.)
  • Memory for connections: 500,000 × 500 bytes = 250 MB

These numbers indicate we need a system capable of handling massive throughput with minimal memory overhead per connection.

3. Core APIs

A load balancer exposes both a data plane (handling actual traffic) and a control plane (configuration and management). Below are the core APIs.

1. Register Backend Server

Endpoint: POST /api/v1/backends

Adds a new backend server to the load balancer pool.

Request Parameters:
  • address _(required)_: IP address or hostname of the backend server.
  • port _(required)_: Port number the backend is listening on.
  • weight _(optional)_: Weight for weighted load balancing (default: 1).
  • health_check_path _(optional)_: HTTP path for health checks (default: /health).
Sample Response:
  • backend_id: Unique identifier for the registered backend.
  • status: Current status (healthy/unhealthy/unknown).
Error Cases:
  • 400 Bad Request: Invalid address or port.
  • 409 Conflict: Backend already registered.

2. Remove Backend Server

Endpoint: DELETE /api/v1/backends/{backend_id}

Removes a backend server from the pool. Existing connections are gracefully drained.

Response:
  • 200 OK: Backend removed successfully.
  • 404 Not Found: Backend ID does not exist.

3. Get Backend Health Status

Endpoint: GET /api/v1/backends/{backend_id}/health

Returns the current health status and metrics for a specific backend.

Sample Response:
  • status: Current health status.
  • last_check: Timestamp of last health check.
  • response_time_ms: Average response time.
  • active_connections: Number of active connections.

4. Configure Load Balancing Algorithm

Endpoint: PUT /api/v1/config/algorithm

Sets the load balancing algorithm for traffic distribution.

Request Parameters:
  • algorithm _(required)_: One of round_robin, weighted_round_robin, least_connections, ip_hash, random.
  • sticky_sessions _(optional)_: Enable session persistence (default: false).
  • sticky_ttl_seconds _(optional)_: TTL for sticky session cookies.
Error Cases:
  • 400 Bad Request: Invalid algorithm name.

4. High-Level Design

At a high level, our load balancer must satisfy these core requirements:

  1. Traffic Distribution: Route incoming requests to healthy backend servers.
  2. Health Monitoring: Detect and isolate unhealthy servers.
  3. High Availability: Remain operational even if load balancer nodes fail.

The architecture can be broken down into a data plane (handles actual traffic) and a control plane (manages configuration and health).

4.1 Requirement 1: Traffic Distribution

The primary function is accepting client connections and forwarding them to backend servers.

Components Needed

Frontend Listener

The entry point for all client traffic. It accepts incoming connections on configured ports (e.g., 80 for HTTP, 443 for HTTPS).

Responsibilities:

  • Accept TCP connections from clients.
  • Parse protocol headers (for Layer 7).
  • Hand off connections to the routing engine.

Routing Engine

The brain of the load balancer. It decides which backend server should handle each request.

Responsibilities:

  • Maintain a list of available backend servers.
  • Apply the configured load balancing algorithm.
  • Track connection counts per backend (for least-connections algorithm).

Backend Pool

A logical group of backend servers that can handle the same type of requests.

Responsibilities:

  • Store backend server metadata (address, port, weight).
  • Track health status of each backend.
  • Support multiple pools for different services.

Flow: Routing a Request

  1. Client sends a request to the load balancer's public IP.
  2. The Frontend Listener accepts the connection.
  3. The Routing Engine selects a backend using the configured algorithm.
  4. The request is forwarded to the selected Backend Server.
  5. The backend processes the request and sends the response.
  6. The load balancer forwards the response back to the client.

4.2 Requirement 2: Health Monitoring

Without health checks, the load balancer would continue sending traffic to crashed or overloaded servers, causing user-facing errors.

Additional Components Needed

Health Checker

A background service that continuously monitors the health of all backend servers.

Responsibilities:

  • Send periodic health probes to each backend.
  • Track success/failure history.
  • Update backend status (healthy/unhealthy).
  • Notify the routing engine of status changes.

Health Check Types

TypeHow It WorksUse Case
TCP CheckAttempts TCP connectionBasic connectivity
HTTP CheckSends HTTP request, expects 2xxWeb applications
Custom ScriptRuns user-defined checkComplex health logic

Flow: Health Check Process

  1. Health Checker sends probes to each backend at configured intervals (e.g., every 5 seconds).
  2. Backend 1 and 3 respond successfully (marked healthy).
  3. Backend 2 fails to respond within timeout (marked unhealthy).
  4. Health Checker notifies the Routing Engine of the status change.
  5. Routing Engine removes Backend 2 from active rotation.
  6. Traffic is distributed only to healthy backends (1 and 3).

4.3 Requirement 3: High Availability

A single load balancer is a single point of failure. If it crashes, all traffic stops.

Additional Components Needed

Multiple LB Nodes

Deploy multiple load balancer instances that can handle traffic independently.

Virtual IP (VIP)

A floating IP address that can be moved between LB nodes. Clients connect to the VIP, not individual LB IPs.

Failover Manager

Coordinates which LB node is active and handles failover when the primary fails.

High Availability Patterns

Active-Passive (Failover)

  • One LB handles all traffic (Active).
  • A standby LB monitors the active via heartbeats.
  • If active fails, standby takes over the VIP within seconds.

Pros: Simple, no state synchronization needed. Cons: Standby resources are wasted during normal operation.

Active-Active

  • Multiple LB nodes handle traffic simultaneously.
  • DNS or upstream router distributes traffic across LB nodes.
  • If one LB fails, others continue handling traffic.

Pros: Better resource utilization, higher throughput. Cons: Requires state synchronization for sticky sessions.

4.4 Putting It All Together

Here is the complete architecture combining all requirements:

Core Components Summary

ComponentPurpose
Virtual IP / DNSSingle entry point for clients
LB NodesAccept and route traffic to backends
Session StoreShared state for sticky sessions (Redis)
Health CheckerMonitor backend health
Config ManagerManage LB configuration
Backend PoolGroup of application servers

5. Database Design

A load balancer is primarily an in-memory, real-time system. It does not typically use a traditional database for the data plane. However, the control plane needs persistent storage for configuration.

5.1 Storage Considerations

Data TypeStorageReason
Active connectionsIn-memory (LB node)Ultra-low latency required
Backend server listIn-memory + Config storeFast lookups, persistent config
Health statusIn-memoryChanges frequently, needs sub-second access
Session mappingsRedis/MemcachedShared across LB nodes
Configurationetcd/Consul/PostgreSQLPersistent, versioned
Metrics/LogsTime-series DB (InfluxDB, Prometheus)Historical analysis

5.2 Configuration Schema

Backend Servers Table

FieldTypeDescription
backend_idString (PK)Unique identifier
pool_idString (FK)Backend pool this server belongs to
addressStringIP address or hostname
portIntegerPort number
weightIntegerWeight for weighted algorithms
max_connectionsIntegerConnection limit
enabledBooleanWhether backend is enabled
created_atTimestampCreation time

Backend Pools Table

FieldTypeDescription
pool_idString (PK)Unique identifier
nameStringHuman-readable name
algorithmEnumLoad balancing algorithm
health_check_pathStringHTTP path for health checks
health_check_intervalIntegerSeconds between checks
sticky_sessionsBooleanEnable session persistence
sticky_ttlIntegerSession cookie TTL

Session Mappings (Redis)

6. Design Deep Dive

Now that we have the high-level architecture in place, let's dive deeper into some critical design choices.

6.1 Load Balancing Algorithms

The choice of load balancing algorithm significantly impacts traffic distribution, backend utilization, and overall system performance.

A good algorithm should:

  • Distribute load evenly across healthy backends.
  • Minimize response time by avoiding overloaded servers.
  • Support various use cases (stateless, stateful, heterogeneous servers).

Let's explore the primary approaches.

Approach 1: Round Robin

The simplest algorithm. Requests are distributed sequentially across all available backends.

How It Works

Maintain a counter that increments with each request. Select the backend at index counter % number_of_backends.

Pros

  • Simple to implement: Just a counter and modulo operation.
  • Fair distribution: Each backend gets equal traffic over time.
  • No state required: Works independently across LB nodes.

Cons

  • Ignores server capacity: A weak server gets the same load as a powerful one.
  • Ignores current load: Doesn't consider existing connections or response times.
  • Not ideal for variable request costs: A heavy request counts the same as a light one.

Best For: Homogeneous backends with similar capacity and stateless requests.

Approach 2: Weighted Round Robin

An extension of round robin that accounts for different server capacities.

How It Works

Each backend is assigned a weight proportional to its capacity. Servers with higher weights receive more requests.

Implementation

Maintain a weighted list or use algorithms like Smooth Weighted Round Robin to avoid bursts to high-weight servers.

Pros

  • Respects server capacity: Powerful servers handle more traffic.
  • Simple configuration: Just assign weights based on server specs.

Cons

  • Static weights: Doesn't adapt to runtime conditions.
  • Manual tuning: Weights must be configured correctly.

Best For: Heterogeneous server pools with known capacity differences.

Approach 3: Least Connections

Routes each new request to the backend with the fewest active connections.

How It Works

Track the number of active connections per backend. When a new request arrives, select the backend with the minimum count.

Pros

  • Adapts to load: Naturally balances based on current state.
  • Handles slow requests: Servers stuck on slow requests get fewer new ones.

Cons

  • Requires connection tracking: Must maintain state across all LB nodes.
  • Cold start issue: New backends may get overwhelmed initially.

Best For: Workloads with varying request processing times.

Approach 4: Weighted Least Connections

Combines least connections with server weights.

How It Works

Select the backend with the lowest ratio of active_connections / weight.

Pros

  • Best of both worlds: Considers both capacity and current load.
  • Optimal utilization: Keeps all servers proportionally loaded.

Cons

  • More complex: Requires tracking connections and weights.

Best For: Production environments with heterogeneous servers.

Approach 5: IP Hash (Source Hashing)

Routes requests based on a hash of the client's IP address.

How It Works

The same client IP always routes to the same backend (assuming the backend pool doesn't change).

Pros

  • Session affinity without cookies: Achieves stickiness at network level.
  • No shared state: Each LB can compute independently.

Cons

  • Uneven distribution: Some IP ranges may cluster on one backend.
  • Disruption on pool changes: Adding/removing backends reshuffles mappings.

Best For: Simple session persistence without application-level changes.

Approach 6: Consistent Hashing

An advanced form of hashing that minimizes disruption when backends are added or removed.

How It Works

  1. Backends are placed on a virtual hash ring based on their identifiers.
  2. Each request's key (e.g., client IP) is hashed to a point on the ring.
  3. The request routes to the first backend found clockwise on the ring.

Pros

  • Minimal disruption: Adding/removing a backend only affects a small portion of requests.
  • Better distribution: Virtual nodes spread load evenly.

Cons

  • More complex: Requires maintaining the hash ring structure.
  • Hotspot potential: Without virtual nodes, distribution can be uneven.

Best For: Systems where backends frequently scale up/down.

Summary and Recommendation

AlgorithmProsConsBest For
Round RobinSimple, statelessIgnores capacityHomogeneous backends
Weighted Round RobinRespects capacityStatic weightsKnown capacity differences
Least ConnectionsAdapts to loadRequires stateVariable request times
Weighted Least ConnOptimal utilizationComplexProduction environments
IP HashSimple stickinessUneven distributionBasic session persistence
Consistent HashMinimal disruptionComplex setupDynamic scaling

Recommendation: For most production systems, Weighted Least Connections provides the best balance of adaptability and efficiency. Use Consistent Hashing when backends scale frequently or for cache-aware routing.

6.2 Health Checking Strategies

Health checks are the foundation of reliable load balancing. Without proper health monitoring, the load balancer would continue routing traffic to failed servers.

Health Check Parameters

ParameterDescriptionTypical Value
IntervalTime between checks5-10 seconds
TimeoutMax wait for response2-3 seconds
Healthy ThresholdConsecutive passes to mark healthy2-3
Unhealthy ThresholdConsecutive failures to mark unhealthy2-3

Health Check Types

1. TCP Health Check

Simply attempts to establish a TCP connection.

Pros: Simple, low overhead, works for any TCP service. Cons: Doesn't verify application is actually working.

2. HTTP Health Check

Sends an HTTP request and validates the response.

Pros: Verifies application is responding correctly. Cons: Higher overhead, requires health endpoint.

3. Custom Script Check

Runs a user-defined script or command.

Pros: Maximum flexibility for complex checks. Cons: Higher complexity and security considerations.

Graceful Degradation

When a backend fails health checks:

  1. Connection Draining: Allow existing connections to complete.
  2. Remove from Pool: Stop routing new requests to the backend.
  3. Alert Operations: Notify monitoring systems.
  4. Auto-Recovery: Return backend to pool after passing health checks.

6.3 Session Persistence (Sticky Sessions)

Some applications require that all requests from a client go to the same backend server, typically when session state is stored locally on the server.

The Problem

Without stickiness:

Approaches to Session Persistence

The load balancer injects a cookie containing the backend identifier.

  1. Client's first request → LB routes to Backend 1
  2. LB adds cookie: Set-Cookie: SERVERID=backend1
  3. Client's next request includes: Cookie: SERVERID=backend1
  4. LB reads cookie and routes to Backend 1

Pros:

  • Works across LB restarts.
  • No shared state needed between LB nodes.

Cons:

  • Requires HTTP-level inspection (Layer 7 only).
  • Cookie must be secure and tamper-proof.

Approach 2: Source IP Persistence

Route based on client IP address (similar to IP Hash).

Pros:

  • Works at Layer 4 (no HTTP parsing).
  • Simple to implement.

Cons:

  • Breaks with NAT (many clients share one IP).
  • No persistence across backend changes.

Approach 3: Application-Level Session Store

Move session state out of the backend entirely.

Pros:

  • Any backend can handle any request.
  • Best for horizontal scaling.
  • Load balancer remains simple.

Cons:

  • Requires application changes.
  • Adds dependency on session store.

Recommendation

Best Practice: Design applications to be stateless and store session data in a shared store (Redis, Memcached). This eliminates the need for sticky sessions and improves scalability.

Use cookie-based stickiness only for legacy applications that cannot be modified.

6.4 Layer 4 vs Layer 7 Load Balancing

Load balancers operate at different layers of the OSI model, each with distinct capabilities and trade-offs.

Layer 4 (Transport Layer)

Routes based on IP address and TCP/UDP port only. Does not inspect packet contents.

Characteristics

AspectLayer 4
SpeedVery fast (hardware-accelerated possible)
IntelligenceLimited (no content awareness)
Use CasesAny TCP/UDP traffic
SSL HandlingPass-through only
Sticky SessionsIP-based only

Layer 7 (Application Layer)

Inspects HTTP headers, URLs, cookies, and content to make routing decisions.

Characteristics

AspectLayer 7
SpeedSlower (content parsing)
IntelligenceHigh (content-based routing)
Use CasesHTTP/HTTPS traffic
SSL HandlingTermination + inspection
Sticky SessionsCookie, header, or URL-based

Layer 7 Routing Capabilities

When to Use Each

ScenarioRecommended
Raw TCP traffic (databases, custom protocols)Layer 4
Maximum performance, simple routingLayer 4
Content-based routingLayer 7
SSL terminationLayer 7
Cookie-based sticky sessionsLayer 7
HTTP header inspection/modificationLayer 7

6.5 SSL/TLS Termination

SSL termination means the load balancer handles encryption/decryption, so backend servers communicate in plain HTTP.

SSL Termination Architecture

Benefits

  • Offloads CPU: Encryption is CPU-intensive. Centralizing it saves backend resources.
  • Simplified Certificate Management: Certificates managed in one place.
  • Content Inspection: LB can read HTTP headers for routing decisions.
  • Performance Optimization: TLS session reuse across clients.

Security Considerations

Traffic between LB and backends is unencrypted. Options:

  1. Trust the network: If LB and backends are in a private network, plain HTTP may be acceptable.
  2. End-to-end encryption: Use HTTPS to backends (re-encryption).
  3. Mutual TLS: Both LB and backends authenticate each other.

SSL Configuration Best Practices

  • Use TLS 1.2+ only (disable older protocols).
  • Enable HTTP/2 for multiplexed connections.
  • Implement OCSP stapling for faster certificate validation.
  • Configure cipher suites to prioritize security and performance.

6.6 Handling Load Balancer Failures

Since the load balancer sits on the critical path, its failure means complete service outage. Designing for high availability is essential.

Failure Detection

Load balancer nodes monitor each other using:

  • Heartbeat messages: Periodic pings between nodes.
  • Health checks: Same mechanism used for backends.
  • Shared storage: Write timestamps to detect node liveness.

Failover Mechanisms

VRRP (Virtual Router Redundancy Protocol)

Industry standard for IP failover.

Failover time: 1-3 seconds

DNS-Based Failover

Multiple LB IPs registered in DNS with health checks.

Failover time: DNS TTL (can be slow)

Anycast

Multiple LB nodes share the same IP address. BGP routing directs traffic to the nearest healthy node.

Failover time: Seconds (BGP convergence)

Stateless vs Stateful Failover

ApproachConnection HandlingComplexity
StatelessActive connections dropped on failoverSimple
StatefulConnections migrated to backupComplex (requires state sync)

Recommendation: Design for stateless failover. Modern applications handle connection drops gracefully with retries.

References

Quiz

Design Load Balancer Quiz

1 / 20
Multiple Choice

What is the primary purpose of a load balancer in a distributed system?