Bloom filters are ubiquitous in high-throughput systems. Redis uses them for cache prefetching. Cassandra and HBase use them to avoid expensive disk lookups. Chrome used them for safe browsing checks. Network routers use them for packet deduplication. These systems process millions of operations per second, and any lock contention is unacceptable.

The good news: Bloom filters have properties that make them surprisingly easy to parallelize. Bits only ever transition from 0 to 1, never back. Setting a bit that's already set is harmless. These properties unlock elegant lock-free implementations.

This chapter explores three approaches to concurrent Bloom filters, from simple locking to lock-free designs that scale linearly with core count.

Problem Statement

What We're Building

A thread-safe Bloom filter that allows multiple threads to add elements and check membership concurrently. A Bloom filter is a probabilistic data structure that can definitively say "not present" but only probabilistically say "might be present." False positives are possible; false negatives are not.

Required Operations

Operation	Description	Expected Complexity
`add(item)`	Add an item to the filter	O(k) where k = number of hash functions
`mightContain(item)`	Check if item might be present	O(k)
`clear()`	Reset the filter to empty state	O(m) where m = bit array size
`approximateCount()`	Estimate number of items added	O(1)

Thread-Safety Requirements

Multiple threads can call add() simultaneously without losing any additions
Multiple threads can call mightContain() simultaneously without blocking
An item added by one thread becomes visible to other threads (eventually)
mightContain() never returns false for an item that was added (no false negatives)
clear() is exclusive, no concurrent adds or queries during clear
False positive rate remains within expected bounds under concurrent access

Data Structure Fundamentals

Before adding concurrency, let's understand how Bloom filters work and why their structure enables efficient parallelization.

Core Concepts

A Bloom filter consists of two components:

Bit array: A fixed-size array of m bits, all initially set to 0
Hash functions: k independent hash functions that map elements to positions in the bit array

To add an element:

Compute k hash values for the element
Set the corresponding k bits in the array to 1

To check membership:

Compute k hash values for the element
If ALL k bits are 1, return "might contain"
If ANY bit is 0, return "definitely not present"

False Positive Analysis

After inserting n elements into a Bloom filter with m bits and k hash functions, the probability that a specific bit is still 0 is:

The false positive probability (all k bits are 1 by chance) is:

Optimal Hash Function Count

Given m bits and n expected elements, the optimal number of hash functions is:

With optimal k, the false positive rate is approximately:

Practical rule of thumb: 10 bits per element gives roughly 1% false positive rate.

How Bloom Filter Works

The diagram shows adding "apple" to a Bloom filter. Three hash functions produce positions 2, 5, and 9, and those bits are set to 1.

Single-Threaded Implementation

Concurrency Challenges

Making a Bloom filter thread-safe is easier than most data structures, but there are still important considerations.

Challenge 1: Setting Multiple Bits Atomically

An add() operation sets k bits. Do these k bit-sets need to be atomic as a group?

Key insight: This is actually fine for Bloom filters. The query returned "not present" during a concurrent insert. This is a valid linearization where the query happened before the insert completed. Since we never return false negatives for completed inserts, correctness is preserved.

Challenge 2: Read Visibility

When Thread 1 sets a bit, when does Thread 2 see it? Without proper memory ordering, Thread 2 might read a stale cached value.

Solution: Use atomic operations with appropriate memory ordering, or accept eventual visibility (often acceptable for Bloom filters).

Challenge 3: The clear() Operation

clear() resets all bits to 0 while other threads might be adding or querying. This can cause serious issues:

This is a false negative: "apple" was added but won't be found because two of its bits were cleared mid-insertion.

Solution: clear() must be exclusive. Use a read-write lock where adds/queries take read lock and clear takes write lock.

Consistency Model

Property	Requirement	Why It Matters
No False Negatives	Mandatory	Core Bloom filter guarantee
Linearizability	Not required	Probabilistic nature tolerates slight inconsistency
Immediate Visibility	Not required	Eventually visible is usually acceptable
Progress Guarantee	Lock-free preferred	High throughput systems need non-blocking adds

The probabilistic nature of Bloom filters means we can tolerate relaxed consistency. A query might miss a concurrent insert (query linearized before insert), but this doesn't violate the no-false-negatives guarantee for completed inserts.

Approach 1: Synchronized Wrapper

The simplest approach wraps the entire Bloom filter with a read-write lock.

Implementation

Analysis

Property	Status	Explanation
Thread-safe	Yes	All access serialized by lock
Deadlock-free	Yes	Single lock, no circular dependencies
Read scalability	Poor	Even queries need write lock (modifying BitSet)
Add scalability	Poor	All adds serialize
Clear safety	Yes	Write lock is exclusive

Problem with Java's BitSet: BitSet.set() and BitSet.get() are not thread-safe. Multiple threads calling set() concurrently can corrupt internal state. This forces us to use write lock even for reads if we want to avoid races.

Performance Characteristics

Pros

Simple and obviously correct
No memory overhead beyond the lock
Easy to reason about correctness

Cons

Zero parallelism for adds
Zero parallelism for queries (due to BitSet thread-safety issues)
Lock contention becomes severe under high load

When to Use:

Low-throughput applications
When correctness verification is important
Prototyping before optimizing

Approach 2: Segmented Locking

Divide the bit array into segments, each with its own lock. Adds and queries only lock the segments they touch.

Strategy: Segment-Based Locking

Handling Cross-Segment Operations

When an add operation needs to set bits in multiple segments, we must acquire all needed locks. To prevent deadlock, we always acquire locks in segment order (lowest to highest).

Implementation

Analysis

Property	Status	Explanation
Thread-safe	Yes	Segment locks + clear lock protect all operations
Deadlock-free	Yes	Always acquire segment locks in order
Read scalability	Good	Different elements often hit different segments
Add scalability	Good	Parallel adds on different segments
Complexity	Medium	Lock ordering logic, segment management

Pros

Much better parallelism than global lock
Independent elements don't contend
Read-write locks allow concurrent queries within a segment

Cons

Lock overhead for acquiring multiple segment locks
If k hash functions spread across many segments, locks become a bottleneck
Memory overhead for multiple locks

When to Use:

Moderate concurrency requirements
When elements tend to hash to similar regions
When simplicity of global lock isn't sufficient

Approach 3: Lock-Free with AtomicLongArray

This approach eliminates locks entirely for add() and mightContain() operations using atomic bit manipulation.

Key Insight: Idempotent Bit Setting

The crucial observation that enables lock-free Bloom filters:

Setting a bit to 1 when it's already 1 is harmless.

In other data structures, concurrent modifications cause lost updates:

HashMap: Two threads inserting can lose one entry
Counter: Two increments without synchronization can become one

But in a Bloom filter:

Thread 1 sets bit 5 to 1
Thread 2 sets bit 5 to 1
Result: bit 5 is 1

The bit is set regardless of who "wins." There are no lost updates because 1 OR 1 = 1.

No ABA Problem

The ABA problem occurs when a value changes from A → B → A, and a CAS operation incorrectly succeeds because it only sees the final A. Bloom filters are immune because bits only ever transition 0 → 1, never back to 0 (except during clear(), which is exclusive).

Implementation Using AtomicLongArray

We store bits in an array of long values, where each long holds 64 bits. This allows efficient atomic operations on groups of bits.

Implementation

CAS Loop Visualization

Memory Ordering Considerations

For Bloom filters, we can use relaxed memory ordering for most operations:

Operation	Memory Order	Rationale
Read for mightContain	acquire	See writes from other threads
CAS in setBit	release on success	Make bit visible to readers
count increment	relaxed	Approximate count, exact value not critical
clear() writes	relaxed	Protected by exclusive lock

The key insight: eventual visibility is acceptable. If a reader doesn't immediately see a just-set bit, the query returns "not present" for an in-progress insert. This is a valid linearization.

Analysis

Property	Status	Explanation
Thread-safe	Yes	CAS ensures atomicity, no ABA problem
Lock-free (add)	Yes	CAS loop always makes progress
Wait-free (query)	Yes	No loops, just reads
Deadlock-free	Yes	No locks for normal operations
Memory overhead	Low	Just AtomicLongArray

Pros

Maximum throughput for add and query
Queries are wait-free (no CAS needed)
Scales linearly with cores
Simple implementation

Cons

Still needs lock for clear() operation
Slightly higher latency per add (CAS can retry)
Requires understanding of memory ordering

When to Use:

High-throughput systems (millions of ops/sec)
When clear() is rare or non-existent
When lock contention is measured bottleneck

Solution Comparison

Aspect	Synchronized	Segmented	Lock-Free
Add throughput	Low	Medium	High
Query throughput	Low	Medium	Very High
Add latency (avg)	Low	Low	Very Low
Add latency (p99)	High (contention)	Medium	Low
Query latency	Low	Low	Minimal
Memory overhead	None	Locks per segment	None
Implementation complexity	Low	Medium	Medium
clear() support	Easy	Easy	Requires lock

Decision Flowchart

Recommendation

For most high-performance use cases, the lock-free implementation is the best choice:

Queries are completely wait-free
Adds use lightweight CAS operations
No lock contention under load
Simple to implement correctly (idempotent bit-setting)

Use synchronized or segmented approaches only when:

Throughput requirements are modest (< 100K ops/sec)
Frequent clear() operations are needed
Team is unfamiliar with lock-free programming

Complete Implementation

This section provides a production-ready lock-free Bloom filter with all features.

Demo

Expected Output:

Design Concurrent Bloom Filter

Ashish Pratap Singh

Problem Statement

What We're Building

Required Operations

Thread-Safety Requirements

Data Structure Fundamentals

Core Concepts

False Positive Analysis

Optimal Hash Function Count

How Bloom Filter Works

Single-Threaded Implementation

Concurrency Challenges

Challenge 1: Setting Multiple Bits Atomically

Challenge 2: Read Visibility

Challenge 3: The clear() Operation

Consistency Model

Approach 1: Synchronized Wrapper

Implementation

Analysis

Performance Characteristics

Pros

Cons

When to Use:

Approach 2: Segmented Locking

Strategy: Segment-Based Locking

Handling Cross-Segment Operations

Implementation

Analysis

Pros

Cons

When to Use:

Approach 3: Lock-Free with AtomicLongArray

Key Insight: Idempotent Bit Setting

No ABA Problem

Implementation Using AtomicLongArray

Implementation

CAS Loop Visualization

Memory Ordering Considerations

Analysis

Pros

Cons

When to Use:

Solution Comparison

Decision Flowchart

Recommendation

Complete Implementation

Demo

Get Premium