Last Updated: December 25, 2025
A unique ID generator is a system that produces identifiers that are globally unique across distributed services. These IDs are used to identify entities like users, orders, transactions, or messages in large-scale systems.
Examples include Twitter’s Snowflake IDs, UUIDs, and database auto-increment keys.
Let’s begin by clarifying the requirements.
A system design interview should always start with a conversation to scope the problem. Here’s an example of how a candidate–interviewer discussion might flow:
Interviewer: "Let's design a distributed unique ID generator."
Candidate: "Great. First, I want to clarify the properties these IDs should have. Is global uniqueness the only hard requirement, or do they also need to be sortable?"
Interviewer: "Good question. Let's say they should be roughly sortable by the time of creation."
Candidate: "Okay. And what about the format? Should they be numeric or can they be strings?"
Interviewer: "Let's stick to numeric, preferably a 64-bit integer, so it fits in a standard bigint column."
Candidate: "Got it. In terms of non-functional requirements, I'll assume the service needs to be highly available, low-latency, and horizontally scalable to handle millions of requests per second."
An ID is said to be K-sortable if it is roughly ordered by creation time meaning that newer IDs are usually greater than older ones, though slight out-of-order scenarios may occur due to distributed generation.
After gathering the details, we can summarize the key system requirements.
Now that we’ve clarified the requirements, let’s explore several approaches to generating unique IDs, evaluating how each performs across scalability, availability, ordering, and complexity dimensions.
The simplest way to generate unique IDs is by using a centralized database sequence or an auto-incrementing primary key.
In this setup, a single database server maintains a counter. Each time a client requests an ID, the database increments the counter and returns the new value, guaranteeing strict order and uniqueness.
This is commonly implemented with an AUTO_INCREMENT or IDENTITY column in SQL.
Every time a new record is inserted, the database automatically assigns the next sequential id value.
This creates a simple but problematic architecture where all services depend on a single database for ID generation.
While simple, this centralized approach comes with significant trade-offs.
This approach works fine for small, monolithic systems but fails to meet the needs of large-scale, distributed environments. It introduces a bottleneck and a single point of failure, violating both scalability and availability requirements.
To overcome the limitations of a single database, you can adapt the auto-increment strategy for a multi-node environment using two common techniques.
In this model, each database node is pre-assigned a unique and non-overlapping range of IDs. This allows each node to generate IDs independently without communicating with others.
For example, in a three-node setup:
In this approach, each node generates IDs using a common step size equal to the number of nodes. Each node is also given a unique starting offset. This creates an "interleaving" pattern of IDs.
For example, if the step size is 3:
1 and generates IDs: 1, 4, 7, 10, ...2 and generates IDs: 2, 5, 8, 11, ...3 and generates IDs: 3, 6, 9, 12, ...This method ensures uniqueness but makes it difficult to add new nodes, as it would require changing the step size and offsets across the entire cluster.
A UUID is a 128-bit value designed to be unique across space and time without requiring any coordination between systems.
Unlike centralized or database-based approaches, UUIDs can be generated independently on any machine while still maintaining near-zero probability of collision.
There are multiple versions of UUIDs, but the two most commonly discussed are UUID v1 and UUID v4.
UUID v1 combines a timestamp (down to 100-nanosecond precision) and the machine's MAC address.
This combination ensures global uniqueness since MAC addresses are unique per device, and timestamps prevent collisions within the same machine.
6ba7b810-9dad-11d1 (Represents a specific point in time)00c04fd430c8 (Derived from the machine's MAC address)UUID v1 is roughly time-sortable because it embeds the timestamp, but it also exposes the MAC address and generation time posing potential privacy risks.
This is the most common version today. It's generated using 122 bits of cryptographically secure randomness. Only 6 bits are used for static version and variant information.
4 in 41d4 indicates it's a v4 UUID.The probability of two UUID v4 values colliding is so infinitesimally small it's considered negligible for all practical purposes.
UUIDs are an excellent choice for logging, trace IDs, correlation identifiers, or temporary tokens where decentralization and ease of generation matter most.
However, for a system that requires sortable, numeric, and compact 64-bit IDs (like those used in databases or distributed message queues), UUIDs are suboptimal. They break index locality and can degrade write performance as the dataset grows.
UUID v7 is a new emerging standard that combines the time-ordering of Snowflake with the simplicity and decentralization of traditional UUIDs.
7)The leading timestamp ensures that IDs are chronologically sortable (K-sortable). The large random portion guarantees uniqueness with an extremely low probability of collision, even when millions of nodes are generating IDs in the same millisecond.
UUID v7 is poised to become the new standard for primary keys in distributed systems, but it's important to understand its trade-offs.
The Ticket Server is a clever evolution of the centralized database model. Instead of hitting the database for every single ID, application servers fetch a block (or range) of IDs in one go and then dispense them from local memory.
This method was famously used by Flickr and later adopted by several large-scale systems before distributed ID generation algorithms (like Snowflake) became popular.
The architecture introduces a dedicated, lightweight service (the Ticket Server) that acts as the single source of truth for ID blocks.
[1001, 2000]) to the Application Server.This architecture dramatically reduces the load on the database, as it's now only contacted once per block instead of once per ID.
The sequence diagram below shows the detailed interaction:
To atomically reserve a block, the Ticket Server can run a single atomic SQL command:
The Ticket Server is a solid, battle-tested approach that offers a pragmatic balance between simplicity and performance. It’s significantly better than a direct database-based generator and is still used in many production systems today.
However, it’s not fully decentralized. It depends on a centralized (though optimized) service for coordination.
The Snowflake algorithm, originally developed by Twitter (now X), is a decentralized and highly scalable approach for generating globally unique, time-ordered 64-bit IDs.
It is widely considered the gold standard for distributed ID generation balancing uniqueness, ordering, performance, and fault tolerance without relying on a single coordination point at runtime.
Each generated 64-bit ID is composed of multiple parts, each serving a specific purpose. The design cleverly encodes time, machine identity, and per-millisecond sequencing into a single compact number.
2020-01-01 00:00:00 UTC).241milliseconds ≈ 69 years2^5 = 32 data centers.2^5 = 32 machines per data center.2^10 = 1024 unique workers globally.212 = 4096 unique IDs per millisecond per machine.This allows a single machine to generate millions of IDs per second with virtually no contention. Each field is bit-shifted and combined into a single 64-bit integer using bitwise operations.
The result is a compact, globally unique, time-ordered ID.
The Snowflake-like service meets all the key requirements:
This makes it the most balanced and production-ready approach for distributed ID generation. Many large-scale systems, including Twitter, Instagram, Discord, and Netflix have used variations of this design.
Now that we understand the Snowflake algorithm conceptually, let’s design the architecture and implementation details for a Snowflake-based ID generation service that can scale horizontally and operate reliably under real-world distributed conditions.
At a high level, the ID Generator Service consists of a cluster of stateless nodes. Each node can independently generate unique IDs once it acquires a unique worker ID during startup.
The most critical requirement for this design is ensuring that no two nodes share the same worker ID (10 bits total: 5 bits for datacenter + 5 bits for machine).
If two nodes accidentally share the same worker ID, they could produce duplicate IDs violating the system’s core guarantee.
We use a coordination service like ZooKeeper or etcd to assign unique worker IDs dynamically.
/workers/id-./workers/id-0000000001/workers/id-0000000002/workers/id-0000000003This mechanism guarantees automatic registration, uniqueness, and fault recovery without human intervention.
The timestamp component of the Snowflake ID makes the system sensitive to clock drift across servers.
If a node’s system clock moves backward, it could generate IDs that overlap with previously issued ones.
All nodes should run Network Time Protocol (NTP) daemons that synchronize their clocks with a trusted central source (e.g., a stratum 1 time server).
The generator should track the last timestamp used. If the current system time is less than the last recorded timestamp, it must:
In practice, rejecting is preferred in safety-critical systems, while spin-wait is acceptable in low-latency internal tools.
Below is a simplified thread-safe implementation of the core Snowflake ID generator logic.
We designed a robust, distributed unique ID generator based on the Snowflake algorithm. This approach is highly performant and scalable because it requires no runtime coordination between nodes.
For a distributed unique ID generator, which requirement best describes K-sortability?