AlgoMaster Logo

Design Distributed Rate Limiter

Ashish

Ashish Pratap Singh

In this article, we will dive into the system design of a distributed rate limiter, and explore the the 5 most commonly used rate limiting algorithms with examples, pros and cons.

Let’s begin by clarifying the requirements.

1. Requirements

Before diving into the architecture, lets outline the functional and non-functional requirements:

1.1 Functional Requirements

  • Per-User Rate Limiting: Enforce a fixed number of requests per user or API key within a defined time window (e.g., 100 requests per minute). Excess requests should be rejected with an HTTP 429 Too Many Requests.
  • Global Enforcement: Limits must be enforced consistently across all nodes in a distributed environment. Users shouldn't bypass limits by switching servers.
  • Multi-Window Support: Apply limits across multiple time granularities simultaneously (e.g., per second, per minute, per hour) to prevent abuse over short and long bursts.

1.2 Non-Functional Requirements

To be usable at scale, our distributed rate-limiter must meet several critical non-functional goals:

  • Scalability: The system should scale horizontally to handle massive request volumes and growing user counts.
  • Low Latency: Rate limit checks should be fast ideally adding no more than a few milliseconds per request.
  • High Availability: The rate-limiter should continue working even under heavy load or node failures. There should be no single point of failure.
  • Strong Consistency: All nodes should have a consistent view of each user’s request counts. This prevents a client from bypassing limits by routing requests through different servers.
  • High Throughput: The system should support a large number of operations per second and serve many concurrent clients without significant performance degradation.

2. High-Level Architecture

Premium Content

This content is for premium members only.