Last Updated: January 30, 2026
Bloom filters are ubiquitous in high-throughput systems. Redis uses them for cache prefetching. Cassandra and HBase use them to avoid expensive disk lookups. Chrome used them for safe browsing checks. Network routers use them for packet deduplication. These systems process millions of operations per second, and any lock contention is unacceptable.
The good news: Bloom filters have properties that make them surprisingly easy to parallelize. Bits only ever transition from 0 to 1, never back. Setting a bit that's already set is harmless. These properties unlock elegant lock-free implementations.
This chapter explores three approaches to concurrent Bloom filters, from simple locking to lock-free designs that scale linearly with core count.
A thread-safe Bloom filter that allows multiple threads to add elements and check membership concurrently. A Bloom filter is a probabilistic data structure that can definitively say "not present" but only probabilistically say "might be present." False positives are possible; false negatives are not.
| Operation | Description | Expected Complexity |
|---|---|---|
add(item) | Add an item to the filter | O(k) where k = number of hash functions |
mightContain(item) | Check if item might be present | O(k) |
clear() | Reset the filter to empty state | O(m) where m = bit array size |
approximateCount() | Estimate number of items added | O(1) |
add() simultaneously without losing any additionsmightContain() simultaneously without blockingmightContain() never returns false for an item that was added (no false negatives)clear() is exclusive, no concurrent adds or queries during clearBefore adding concurrency, let's understand how Bloom filters work and why their structure enables efficient parallelization.
A Bloom filter consists of two components:
To add an element:
To check membership:
After inserting n elements into a Bloom filter with m bits and k hash functions, the probability that a specific bit is still 0 is:
The false positive probability (all k bits are 1 by chance) is:
Given m bits and n expected elements, the optimal number of hash functions is:
With optimal k, the false positive rate is approximately:
Practical rule of thumb: 10 bits per element gives roughly 1% false positive rate.
The diagram shows adding "apple" to a Bloom filter. Three hash functions produce positions 2, 5, and 9, and those bits are set to 1.
Making a Bloom filter thread-safe is easier than most data structures, but there are still important considerations.
An add() operation sets k bits. Do these k bit-sets need to be atomic as a group?
Key insight: This is actually fine for Bloom filters. The query returned "not present" during a concurrent insert. This is a valid linearization where the query happened before the insert completed. Since we never return false negatives for completed inserts, correctness is preserved.
When Thread 1 sets a bit, when does Thread 2 see it? Without proper memory ordering, Thread 2 might read a stale cached value.
Solution: Use atomic operations with appropriate memory ordering, or accept eventual visibility (often acceptable for Bloom filters).
clear() resets all bits to 0 while other threads might be adding or querying. This can cause serious issues:
This is a false negative: "apple" was added but won't be found because two of its bits were cleared mid-insertion.
Solution: clear() must be exclusive. Use a read-write lock where adds/queries take read lock and clear takes write lock.
| Property | Requirement | Why It Matters |
|---|---|---|
| No False Negatives | Mandatory | Core Bloom filter guarantee |
| Linearizability | Not required | Probabilistic nature tolerates slight inconsistency |
| Immediate Visibility | Not required | Eventually visible is usually acceptable |
| Progress Guarantee | Lock-free preferred | High throughput systems need non-blocking adds |
The probabilistic nature of Bloom filters means we can tolerate relaxed consistency. A query might miss a concurrent insert (query linearized before insert), but this doesn't violate the no-false-negatives guarantee for completed inserts.
The simplest approach wraps the entire Bloom filter with a read-write lock.
| Property | Status | Explanation |
|---|---|---|
| Thread-safe | Yes | All access serialized by lock |
| Deadlock-free | Yes | Single lock, no circular dependencies |
| Read scalability | Poor | Even queries need write lock (modifying BitSet) |
| Add scalability | Poor | All adds serialize |
| Clear safety | Yes | Write lock is exclusive |
Problem with Java's BitSet: BitSet.set() and BitSet.get() are not thread-safe. Multiple threads calling set() concurrently can corrupt internal state. This forces us to use write lock even for reads if we want to avoid races.
Divide the bit array into segments, each with its own lock. Adds and queries only lock the segments they touch.
When an add operation needs to set bits in multiple segments, we must acquire all needed locks. To prevent deadlock, we always acquire locks in segment order (lowest to highest).
| Property | Status | Explanation |
|---|---|---|
| Thread-safe | Yes | Segment locks + clear lock protect all operations |
| Deadlock-free | Yes | Always acquire segment locks in order |
| Read scalability | Good | Different elements often hit different segments |
| Add scalability | Good | Parallel adds on different segments |
| Complexity | Medium | Lock ordering logic, segment management |
This approach eliminates locks entirely for add() and mightContain() operations using atomic bit manipulation.
The crucial observation that enables lock-free Bloom filters:
Setting a bit to 1 when it's already 1 is harmless.
In other data structures, concurrent modifications cause lost updates:
But in a Bloom filter:
The bit is set regardless of who "wins." There are no lost updates because 1 OR 1 = 1.
The ABA problem occurs when a value changes from A → B → A, and a CAS operation incorrectly succeeds because it only sees the final A. Bloom filters are immune because bits only ever transition 0 → 1, never back to 0 (except during clear(), which is exclusive).
We store bits in an array of long values, where each long holds 64 bits. This allows efficient atomic operations on groups of bits.
For Bloom filters, we can use relaxed memory ordering for most operations:
| Operation | Memory Order | Rationale |
|---|---|---|
| Read for mightContain | acquire | See writes from other threads |
| CAS in setBit | release on success | Make bit visible to readers |
| count increment | relaxed | Approximate count, exact value not critical |
| clear() writes | relaxed | Protected by exclusive lock |
The key insight: eventual visibility is acceptable. If a reader doesn't immediately see a just-set bit, the query returns "not present" for an in-progress insert. This is a valid linearization.
| Property | Status | Explanation |
|---|---|---|
| Thread-safe | Yes | CAS ensures atomicity, no ABA problem |
| Lock-free (add) | Yes | CAS loop always makes progress |
| Wait-free (query) | Yes | No loops, just reads |
| Deadlock-free | Yes | No locks for normal operations |
| Memory overhead | Low | Just AtomicLongArray |
| Aspect | Synchronized | Segmented | Lock-Free |
|---|---|---|---|
| Add throughput | Low | Medium | High |
| Query throughput | Low | Medium | Very High |
| Add latency (avg) | Low | Low | Very Low |
| Add latency (p99) | High (contention) | Medium | Low |
| Query latency | Low | Low | Minimal |
| Memory overhead | None | Locks per segment | None |
| Implementation complexity | Low | Medium | Medium |
| clear() support | Easy | Easy | Requires lock |
For most high-performance use cases, the lock-free implementation is the best choice:
Use synchronized or segmented approaches only when:
This section provides a production-ready lock-free Bloom filter with all features.
Expected Output: