Question

Design an in-memory rate limiter that restricts the number of allowed requests per client within a specified time window.

A rate limiter is a system component or algorithm used to control the rate of operations performed by a user, client, or service over a given period. It helps prevent abuse, reduce load, and ensure fair usage of system resources.

Rate limiters are commonly applied to:

API request throttling
Login attempt restrictions
Messaging or notification limits
Preventing denial-of-service (DoS) attacks

In this chapter, we will explore the low-level design of a rate limiter like system in detail.

Let's start by clarifying the requirements:

1. Clarifying Requirements

Before starting the design, it's important to ask thoughtful questions to uncover hidden assumptions, clarify ambiguities, and define the system's scope more precisely.

Here is an example of how a discussion between the candidate and the interviewer might unfold:

Discussion

Candidate: Should the rate limiter work on a per-user basis, or should it be based on API key or IP address?

Interviewer: Let’s keep it simple and go with per-user rate limiting. You can assume users are uniquely identified by a user ID or token.

Candidate: Which rate limiting algorithm should we implement fixed window, sliding window, or token bucket?

Interviewer: Use the fixed window algorithm and token bucket if possible for this version. We can consider more advanced approaches later.

Candidate: Should all users have the same rate limit, or can it vary per user?

Interviewer: Assume the same rate limit for all users (e.g., 100 requests per 60 seconds).

Candidate: What should happen when a user exceeds the rate limit? Should we silently drop the request or notify the user?

Interviewer: The system should clearly inform the user that they’ve exceeded the limit.

Candidate: Do we need to handle concurrency in case of multiple threads trying to access or update rate limits for the same user?

Interviewer: : Yes, the implementation should be thread-safe and handle concurrent access reliably.

After gathering the details, we can summarize the key system requirements.

1.1 Functional Requirements

Support rate limiting on a per-user basis.
Enforce a fixed number of allowed requests (e.g., 100) within a defined time window (e.g., 60 seconds).
Reject requests that exceed the allowed limit and return an appropriate response.
Provide a simple way to simulate requests in a demo or main method.

1.2 Non-Functional Requirements

Thread-Safety: The rate limiter must handle concurrent access from multiple threads without race conditions.
Modularity: The system should follow object-oriented design principles with clear separation of concerns.
Extensibility: The design should be flexible enough to support other rate limiting strategies like sliding window or token bucket.
Maintainability: The codebase should be clean, testable, and easy to extend or debug.
Performance: The implementation should efficiently support high-frequency request patterns using optimal data structures.

2. Identifying Core Entities

Premium Content

This content is for premium members only.

Design In-Memory Rate Limiter