Last Updated: May 26, 2026
Streaming systems often need frequency estimates for large, unbounded, or distributed event streams.
A Count-Min Sketch estimates item counts with a fixed-size matrix of counters. Compatible sketches can be merged across shards by adding corresponding counters.
Count-Min Sketch never underestimates in the standard non-negative update model, but hash collisions can cause overestimation. It estimates counts for queried items; it does not enumerate items by itself.
The exact approach is a hash map:
This is exact and simple, but every distinct item carries more than just a counter. The system also stores the key, hash-table metadata, allocator overhead, and any replication or checkpointing cost.
In a stream with millions or billions of distinct keys, exact maps become expensive. They are also hard to merge efficiently if every shard has a large local map.
Count-Min Sketch is useful when the stream is large, approximate counts are acceptable, memory must be bounded, and sketches need to merge across partitions. It is not useful when you need exact counts, item enumeration, or accurate counts for rare items.