A rate limiter is a system component or algorithm used to control the rate of operations performed by a user, client, or service over a given period. It helps prevent abuse, reduce load, and ensure fair usage of system resources.
Rate limiters are commonly applied to:
In this chapter, we will explore the low-level design of a rate limiter like system in detail.
Let's start by clarifying the requirements:
Before starting the design, it's important to ask thoughtful questions to uncover hidden assumptions, clarify ambiguities, and define the system's scope more precisely.
Here is an example of how a discussion between the candidate and the interviewer might unfold:
Candidate: Should the rate limiter work on a per-user basis, or should it be based on API key or IP address?
Interviewer: Let’s keep it simple and go with per-user rate limiting. You can assume users are uniquely identified by a user ID or token.
Candidate: Which rate limiting algorithm should we implement fixed window, sliding window, or token bucket?
Interviewer: Use the fixed window algorithm and token bucket if possible for this version. We can consider more advanced approaches later.
Candidate: Should all users have the same rate limit, or can it vary per user?
Interviewer: Assume the same rate limit for all users (e.g., 100 requests per 60 seconds).
Candidate: What should happen when a user exceeds the rate limit? Should we silently drop the request or notify the user?
Interviewer: The system should clearly inform the user that they’ve exceeded the limit.
Candidate: Do we need to handle concurrency in case of multiple threads trying to access or update rate limits for the same user?
Interviewer: : Yes, the implementation should be thread-safe and handle concurrent access reliably.
After gathering the details, we can summarize the key system requirements.