What is a Rate Limiter?

A rate limiter is a system that controls the number of requests a user or client can make to an API or service within a specific time window. It helps protect services from abuse, prevents resource exhaustion, and ensures fair usage among clients.

Example: If a system allows a maximum of 100 requests per minute, any request beyond that limit within the same minute would either be throttled (delayed) or rejected outright, often with an HTTP 429 Too Many Requests response.

In this article, we will dive into the system design of a distributed rate limiter, and explore the the 5 most commonly used rate limiting algorithms with examples, pros and cons.

Let’s begin by clarifying the requirements.

1. Requirements

Before diving into the architecture, lets outline the functional and non-functional requirements:

1.1 Functional Requirements

Per-User Rate Limiting: Enforce a fixed number of requests per user or API key within a defined time window (e.g., 100 requests per minute). Excess requests should be rejected with an HTTP 429 Too Many Requests.
Global Enforcement: Limits must be enforced consistently across all nodes in a distributed environment. Users shouldn't bypass limits by switching servers.
Multi-Window Support: Apply limits across multiple time granularities simultaneously (e.g., per second, per minute, per hour) to prevent abuse over short and long bursts.

1.2 Non-Functional Requirements

To be usable at scale, our distributed rate-limiter must meet several critical non-functional goals:

Scalability: The system should scale horizontally to handle massive request volumes and growing user counts.
Low Latency: Rate limit checks should be fast ideally adding no more than a few milliseconds per request.
High Availability: The rate-limiter should continue working even under heavy load or node failures. There should be no single point of failure.
Strong Consistency: All nodes should have a consistent view of each user’s request counts. This prevents a client from bypassing limits by routing requests through different servers.
High Throughput: The system should support a large number of operations per second and serve many concurrent clients without significant performance degradation.

2. High-Level Architecture

Premium Content

This content is for premium members only.

Design Distributed Rate Limiter