AlgoMaster Logo

Design Concurrent Bloom Filter

Last Updated: January 30, 2026

Ashish

Ashish Pratap Singh

Bloom filters are ubiquitous in high-throughput systems. Redis uses them for cache prefetching. Cassandra and HBase use them to avoid expensive disk lookups. Chrome used them for safe browsing checks. Network routers use them for packet deduplication. These systems process millions of operations per second, and any lock contention is unacceptable.

The good news: Bloom filters have properties that make them surprisingly easy to parallelize. Bits only ever transition from 0 to 1, never back. Setting a bit that's already set is harmless. These properties unlock elegant lock-free implementations.

This chapter explores three approaches to concurrent Bloom filters, from simple locking to lock-free designs that scale linearly with core count.

Problem Statement

What We're Building

A thread-safe Bloom filter that allows multiple threads to add elements and check membership concurrently. A Bloom filter is a probabilistic data structure that can definitively say "not present" but only probabilistically say "might be present." False positives are possible; false negatives are not.

Required Operations

OperationDescriptionExpected Complexity
add(item)Add an item to the filterO(k) where k = number of hash functions
mightContain(item)Check if item might be presentO(k)
clear()Reset the filter to empty stateO(m) where m = bit array size
approximateCount()Estimate number of items addedO(1)

Thread-Safety Requirements

  • Multiple threads can call add() simultaneously without losing any additions
  • Multiple threads can call mightContain() simultaneously without blocking
  • An item added by one thread becomes visible to other threads (eventually)
  • mightContain() never returns false for an item that was added (no false negatives)
  • clear() is exclusive, no concurrent adds or queries during clear
  • False positive rate remains within expected bounds under concurrent access

Data Structure Fundamentals

Before adding concurrency, let's understand how Bloom filters work and why their structure enables efficient parallelization.

Core Concepts

A Bloom filter consists of two components:

  1. Bit array: A fixed-size array of m bits, all initially set to 0
  2. Hash functions: k independent hash functions that map elements to positions in the bit array

To add an element:

  1. Compute k hash values for the element
  2. Set the corresponding k bits in the array to 1

To check membership:

  1. Compute k hash values for the element
  2. If ALL k bits are 1, return "might contain"
  3. If ANY bit is 0, return "definitely not present"

False Positive Analysis

After inserting n elements into a Bloom filter with m bits and k hash functions, the probability that a specific bit is still 0 is:

The false positive probability (all k bits are 1 by chance) is:

Optimal Hash Function Count

Given m bits and n expected elements, the optimal number of hash functions is:

With optimal k, the false positive rate is approximately:

Practical rule of thumb: 10 bits per element gives roughly 1% false positive rate.

How Bloom Filter Works

The diagram shows adding "apple" to a Bloom filter. Three hash functions produce positions 2, 5, and 9, and those bits are set to 1.

Single-Threaded Implementation

Concurrency Challenges

Making a Bloom filter thread-safe is easier than most data structures, but there are still important considerations.

Challenge 1: Setting Multiple Bits Atomically

An add() operation sets k bits. Do these k bit-sets need to be atomic as a group?

Key insight: This is actually fine for Bloom filters. The query returned "not present" during a concurrent insert. This is a valid linearization where the query happened before the insert completed. Since we never return false negatives for completed inserts, correctness is preserved.

Challenge 2: Read Visibility

When Thread 1 sets a bit, when does Thread 2 see it? Without proper memory ordering, Thread 2 might read a stale cached value.

Solution: Use atomic operations with appropriate memory ordering, or accept eventual visibility (often acceptable for Bloom filters).

Challenge 3: The clear() Operation

clear() resets all bits to 0 while other threads might be adding or querying. This can cause serious issues:

This is a false negative: "apple" was added but won't be found because two of its bits were cleared mid-insertion.

Solution: clear() must be exclusive. Use a read-write lock where adds/queries take read lock and clear takes write lock.

Consistency Model

PropertyRequirementWhy It Matters
No False NegativesMandatoryCore Bloom filter guarantee
LinearizabilityNot requiredProbabilistic nature tolerates slight inconsistency
Immediate VisibilityNot requiredEventually visible is usually acceptable
Progress GuaranteeLock-free preferredHigh throughput systems need non-blocking adds

The probabilistic nature of Bloom filters means we can tolerate relaxed consistency. A query might miss a concurrent insert (query linearized before insert), but this doesn't violate the no-false-negatives guarantee for completed inserts.

Approach 1: Synchronized Wrapper

The simplest approach wraps the entire Bloom filter with a read-write lock.

Implementation

Analysis

PropertyStatusExplanation
Thread-safeYesAll access serialized by lock
Deadlock-freeYesSingle lock, no circular dependencies
Read scalabilityPoorEven queries need write lock (modifying BitSet)
Add scalabilityPoorAll adds serialize
Clear safetyYesWrite lock is exclusive

Problem with Java's BitSet: BitSet.set() and BitSet.get() are not thread-safe. Multiple threads calling set() concurrently can corrupt internal state. This forces us to use write lock even for reads if we want to avoid races.

Performance Characteristics

Pros

  • Simple and obviously correct
  • No memory overhead beyond the lock
  • Easy to reason about correctness

Cons

  • Zero parallelism for adds
  • Zero parallelism for queries (due to BitSet thread-safety issues)
  • Lock contention becomes severe under high load

When to Use:

  • Low-throughput applications
  • When correctness verification is important
  • Prototyping before optimizing

Approach 2: Segmented Locking

Divide the bit array into segments, each with its own lock. Adds and queries only lock the segments they touch.

Strategy: Segment-Based Locking

Handling Cross-Segment Operations

When an add operation needs to set bits in multiple segments, we must acquire all needed locks. To prevent deadlock, we always acquire locks in segment order (lowest to highest).

Implementation

Analysis

PropertyStatusExplanation
Thread-safeYesSegment locks + clear lock protect all operations
Deadlock-freeYesAlways acquire segment locks in order
Read scalabilityGoodDifferent elements often hit different segments
Add scalabilityGoodParallel adds on different segments
ComplexityMediumLock ordering logic, segment management

Pros

  • Much better parallelism than global lock
  • Independent elements don't contend
  • Read-write locks allow concurrent queries within a segment

Cons

  • Lock overhead for acquiring multiple segment locks
  • If k hash functions spread across many segments, locks become a bottleneck
  • Memory overhead for multiple locks

When to Use:

  • Moderate concurrency requirements
  • When elements tend to hash to similar regions
  • When simplicity of global lock isn't sufficient

Approach 3: Lock-Free with AtomicLongArray

This approach eliminates locks entirely for add() and mightContain() operations using atomic bit manipulation.

Key Insight: Idempotent Bit Setting

The crucial observation that enables lock-free Bloom filters:

Setting a bit to 1 when it's already 1 is harmless.

In other data structures, concurrent modifications cause lost updates:

  • HashMap: Two threads inserting can lose one entry
  • Counter: Two increments without synchronization can become one

But in a Bloom filter:

  • Thread 1 sets bit 5 to 1
  • Thread 2 sets bit 5 to 1
  • Result: bit 5 is 1

The bit is set regardless of who "wins." There are no lost updates because 1 OR 1 = 1.

No ABA Problem

The ABA problem occurs when a value changes from A → B → A, and a CAS operation incorrectly succeeds because it only sees the final A. Bloom filters are immune because bits only ever transition 0 → 1, never back to 0 (except during clear(), which is exclusive).

Implementation Using AtomicLongArray

We store bits in an array of long values, where each long holds 64 bits. This allows efficient atomic operations on groups of bits.

Implementation

CAS Loop Visualization

Memory Ordering Considerations

For Bloom filters, we can use relaxed memory ordering for most operations:

OperationMemory OrderRationale
Read for mightContainacquireSee writes from other threads
CAS in setBitrelease on successMake bit visible to readers
count incrementrelaxedApproximate count, exact value not critical
clear() writesrelaxedProtected by exclusive lock

The key insight: eventual visibility is acceptable. If a reader doesn't immediately see a just-set bit, the query returns "not present" for an in-progress insert. This is a valid linearization.

Analysis

PropertyStatusExplanation
Thread-safeYesCAS ensures atomicity, no ABA problem
Lock-free (add)YesCAS loop always makes progress
Wait-free (query)YesNo loops, just reads
Deadlock-freeYesNo locks for normal operations
Memory overheadLowJust AtomicLongArray

Pros

  • Maximum throughput for add and query
  • Queries are wait-free (no CAS needed)
  • Scales linearly with cores
  • Simple implementation

Cons

  • Still needs lock for clear() operation
  • Slightly higher latency per add (CAS can retry)
  • Requires understanding of memory ordering

When to Use:

  • High-throughput systems (millions of ops/sec)
  • When clear() is rare or non-existent
  • When lock contention is measured bottleneck

Solution Comparison

AspectSynchronizedSegmentedLock-Free
Add throughputLowMediumHigh
Query throughputLowMediumVery High
Add latency (avg)LowLowVery Low
Add latency (p99)High (contention)MediumLow
Query latencyLowLowMinimal
Memory overheadNoneLocks per segmentNone
Implementation complexityLowMediumMedium
clear() supportEasyEasyRequires lock

Decision Flowchart

Recommendation

For most high-performance use cases, the lock-free implementation is the best choice:

  1. Queries are completely wait-free
  2. Adds use lightweight CAS operations
  3. No lock contention under load
  4. Simple to implement correctly (idempotent bit-setting)

Use synchronized or segmented approaches only when:

  • Throughput requirements are modest (< 100K ops/sec)
  • Frequent clear() operations are needed
  • Team is unfamiliar with lock-free programming

Complete Implementation

This section provides a production-ready lock-free Bloom filter with all features.

Demo

Expected Output: