Last Updated: May 26, 2026
Analytics systems often need distinct counts across shards, time buckets, tenants, and experiments.
HyperLogLog estimates cardinality using a compact, mergeable sketch. It can combine summaries from many producers without moving raw identifiers.
HyperLogLog trades exactness for fixed memory. It cannot answer membership queries, list elements, or delete individual items.
The exact approach is a set:
This is exact, but memory grows with the number of unique items.
The real memory cost is usually much higher than the raw ID size. A set stores keys, hash-table metadata, load-factor slack, object headers, allocator overhead, and sometimes string contents. A billion distinct identifiers can easily mean many gigabytes.
Now multiply that by dimensions such as page, tenant, country, device type, hour, experiment variant, and retention period.
Exact sets become expensive quickly.
| Question | Exact Set Cost | HyperLogLog Cost |
|---|---|---|
| Daily unique users | Grows with unique users | Fixed by sketch precision |
| Weekly unique users | Requires deduping all days | Merge daily sketches |
| Unique users per page | One set per page | One sketch per page |
| Global unique users across shards | Shuffle IDs or sets | Merge sketches |
HyperLogLog is useful when the exact identities are not needed after counting.