In a distributed system where nodes (servers) are frequently added or removed, efficiently routing requests becomes challenging.
A common approach is to use hash the request and assign it to a server using Hash(key) mod N
, where N is the number of servers.
However, this method is highly dependent on the number of servers, and any change in N can lead to significant rehashing, causing a major redistribution of keys (requests).
Consistent hashing addresses this issue by ensuring that only a small subset of keys need to be reassigned when nodes are added or removed.
Popularized by Amazon’s Dynamo paper, it has now become a fundamental technique in distributed databases like DynamoDB, Cassandra and ScyllaDB.
In this article, we’ll explore what consistent hashing is, why it’s needed, how it works, and how to implement it in code.
Imagine you're building a high-traffic web application that serves millions of users daily. To handle the load efficiently, you distribute incoming requests across multiple backend servers using a hash-based load balancer.
Your system consists of 5 backend servers (S0, S1, S2, S3, S4
), and requests are assigned using a hash function that maps each user's IP address to a specific server.
The process works like this:
mod 5
(since we have 5 servers).Example:
This approach works as long as the number of servers remains constant. But what happens when you add or remove a server?
As traffic increases, you decide to scale up by adding a new backend server (S5
). Now, the hash function must be modified to use mod 6
instead of mod 5
since we have 6 servers now.
This seemingly simple change completely disrupts the existing mapping, causing most users to be reassigned to different servers.
This results into massive rehashing, leading to high overhead, and potential downtime.
Now, let’s say one of the servers (S4
) fails or is removed. The number of servers drops to 4, forcing the hash function to switch from mod 5
to mod 4
.
Even though only one server was removed, most users are reassigned to different servers. This can cause:
Consistent hashing offers a more scalable and efficient solution by ensuring that only a small fraction of users are reassigned when scaling up or down.
It performs really well when operated in dynamic environments, where the system scales up and down frequently.
Consistent hashing is a distributed hashing technique used to efficiently distribute data across multiple nodes (servers, caches, etc.).
It uses a circular hash space (hash ring) with a large and constant hash space.
Both nodes (servers, caches, or databases) and keys (data items) are mapped to positions on this hash ring using a hash function.
Unlike modulo-based hashing, where changes in the number of nodes cause large-scale remapping, consistent hashing ensures that only a small fraction of keys are reassigned when a node is added or removed, making it highly scalable and efficient.
In consistent hashing, when the number of nodes changes, only
k/n
keys need to be reassigned, wherek
is the total number of keys andn
is the total number of nodes.
Instead of distributing keys based on Hash(key) mod N
, consistent hashing places both servers and keys on a circular hash ring.
0
to 2^32 - 1
(assuming a 32-bit hash function).Hash(server_id)
.S0, S1, S2, S3, S4
), the hash function distributes them at different positions around the ring.Hash(key)
.Note: In case a key’s hash falls directly on a node’s position, it belongs to that node.
Suppose we add a new server (S5
) to the system.
S5
falls between S1
and S2
in the hash ring.S5
takes over all keys (requests) that fall between S1
and S5
, which were previously handled by S2
.User D’s
requests which were originally assigned to S2,
will now be redirected to S5.
This demonstrates how consistent hashing efficiently redistributes keys with minimal disruption, ensuring that only a small subset of keys are reassigned when new servers are added.
When a server, such as S4
, fails or is removed from the system:
S4
are reassigned to the next available server in the ring (S3
).S4
need to move, while all other keys remain unaffected.This results in minimal data movement, unlike traditional hashing where removing a node would require reassigning most keys.
In basic consistent hashing, each server is assigned a single position on the hash ring. However, this can lead to uneven data distribution, especially when:
Virtual nodes (VNodes) are a technique used in consistent hashing to improve load balancing and fault tolerance by distributing data more evenly across servers.
Instead of assigning one position per server, each physical server is assigned multiple positions (virtual nodes) on the hash ring.
Assume we have three physical servers (S1, S2, S3). Without virtual nodes, their positions might be:
S1 → Position 10
S2 → Position 50
S3 → Position 90
If S1 fails, all its keys must be reassigned to S2, which can create an overload.
With virtual nodes, each server is hashed multiple times:
Now, instead of just one point, S1
is represented at multiple positions, making the distribution more even.
If S1 fails, its keys are more evenly redistributed among S2 and S3, rather than all going to S2.
self.ring
): Stores hash values → server mappings. Uses virtual nodes (replicas) for better load balancing.self.sorted_keys
): Maintains a sorted list of hash values for efficient lookups. self.servers
): Tracks active physical servers.__init__
)add_server()
for each server, hashing it multiple times (based on num_replicas
) to ensure even distribution.Hashing Function (_hash
)
Adding a Server (add_server
)
server-0
, server-1
, etc.).self.ring
and maintains a sorted order in self.sorted_keys
for fast lookup.Removing a Server (remove_server
)
self.ring
and self.sorted_keys
.Getting a Server (get_server
)
bisect.bisect()
. Wraps around to the first node if necessary.