AlgoMaster Logo

HyperLogLog

Last Updated: May 26, 2026

Ashish

Ashish Pratap Singh

Low Priority
5 min read

Analytics systems often need distinct counts across shards, time buckets, tenants, and experiments.

HyperLogLog estimates cardinality using a compact, mergeable sketch. It can combine summaries from many producers without moving raw identifiers.

HyperLogLog trades exactness for fixed memory. It cannot answer membership queries, list elements, or delete individual items.

1. The Counting Problem

The exact approach is a set:

This is exact, but memory grows with the number of unique items.

The real memory cost is usually much higher than the raw ID size. A set stores keys, hash-table metadata, load-factor slack, object headers, allocator overhead, and sometimes string contents. A billion distinct identifiers can easily mean many gigabytes.

Now multiply that by dimensions such as page, tenant, country, device type, hour, experiment variant, and retention period.

Exact sets become expensive quickly.

QuestionExact Set CostHyperLogLog Cost
Daily unique usersGrows with unique usersFixed by sketch precision
Weekly unique usersRequires deduping all daysMerge daily sketches
Unique users per pageOne set per pageOne sketch per page
Global unique users across shardsShuffle IDs or setsMerge sketches

HyperLogLog is useful when the exact identities are not needed after counting.

2. The Core Idea

Premium Content

This content is for premium members only.