AlgoMaster Logo

Design Bloom Filter

Last Updated: January 22, 2026

Ashish

Ashish Pratap Singh

easy

In this chapter, we will explore the low-level design of bloom filter in detail.

Lets start by clarifying the requirements:

1. Clarifying Requirements

Before starting the design, it's important to ask thoughtful questions to uncover hidden assumptions, clarify ambiguities, and define the system's scope more precisely.

Here is an example of how a discussion between the candidate and the interviewer might unfold:

Based on the discussion, here’s a summary of the functional requirements:

1.1 Functional Requirements

  1. Support add(element): Add an element to the set.
  2. Support mightContain(element): Check if an element might be in the set.
  3. The filter should operate on strings as the input data type.
  4. Support configurable values of k (number of hash functions) and m (size of the bit array).
  5. Support multiple hash function implementations (FNV1a, DJB2)

1.2 Non-Functional Requirements

  1. Space Efficiency: The Bloom Filter should use significantly less memory than storing the actual elements
  2. High Performance: Both add and mightContain operations should be optimized for speed, ideally executing in O(1) time.
  3. Thread-Safety: The implementation should be safe to use in a concurrent environment.
  4. Idempotency: Adding the same element multiple times should be idempotent

Now that we understand what we're building, let's identify the building blocks of our system.

2. Identifying Core Entities

How do you go from a list of requirements to actual classes? The key is to look for nouns in the requirements that have distinct attributes or behaviors.

The design of a Bloom Filter is different from typical object-oriented systems. Instead of modeling real-world entities, we focus on choosing the right abstractions for a probabilistic data structure.

Let's walk through our requirements and identify what needs to exist in our system.

1. Bit Array (Data Store)

"Support add(element) and mightContain(element)"

The fundamental requirement is space efficiency. Since we don't store actual elements, we need a compact representation. This leads us to a bit array, where each element is a single bit (0 or 1).

  • Initially, all bits are set to 0
  • When an element is added, multiple hash functions map it to k bit positions, and those bits are set to 1
  • When checking for presence, the same k bits are examined. If all are 1, the element might be in the set. If any bit is 0, it is definitely not present

2. Hash Functions

"Support multiple hash function implementations"

To populate and check the bit array, we need a way to map each element to bit positions. This is the role of our hash functions.

A Bloom Filter uses k independent hash functions, each mapping a string to a bit array index between 0 and m-1. The same hash functions must be used in both add() and mightContain() operations.

This suggests two abstractions:

  • HashType: An enum defining available hash function types (FNV1A, DJB2)
  • HashStrategy: An interface defining the contract for hash functions
  • Concrete strategies: FNV1aHashStrategy, DJB2HashStrategy

3. BloomFilter

"The filter should be configurable and thread-safe"

Finally, we need a public-facing class that ties everything together. The BloomFilter class:

  • Owns the bit array
  • Holds references to hash strategies
  • Exposes add() and mightContain() methods
  • Handles thread-safety for concurrent access

2.4 Factory and Builder

"Support configurable bit array size (m) and number of hash functions (k)"

Creating a Bloom Filter requires several parameters that must be coordinated. We use:

  • HashStrategyFactory: Creates hash strategy instances from enum types
  • BloomFilter.Builder: Provides a fluent API for configuration with validation

2.5 Entity Overview

Here's how these entities relate to each other:

We've identified several types of entities:

Enums define fixed sets of values for type safety.

Interfaces define contracts that implementations must follow.

Strategy Implementations provide the actual hash function logic.

Factory encapsulates strategy creation logic.

Core Classes contain the main Bloom Filter logic.

Scroll
EntityTypeResponsibility
HashTypeEnumIdentifies available hash algorithms
HashStrategyInterfaceContract for hash function implementations
FNV1aHashStrategyStrategyFNV-1a 64-bit hash implementation
DJB2HashStrategyStrategyDJB2 hash implementation
HashStrategyFactoryFactoryCreates hash strategies from enum types
BloomFilterCore ClassMain class exposing add() and mightContain()
BloomFilter.BuilderBuilderFluent API for BloomFilter construction

With our entities identified, let's define their attributes, behaviors, and relationships.

3. Designing Classes and Relationships

Now that we know what entities we need, let's flesh out their details. For each class, we'll define what data it holds (attributes) and what it can do (methods). Then we'll look at how these classes connect to each other.

3.1 Class Definitions

We'll work bottom-up: simple types first, then interfaces, then implementations, then the main class.

Enums

HashType identifies the available hash function algorithms.

ValueDescription
FNV1AFowler-Noll-Vo hash function (64-bit variant)
DJB2Daniel J. Bernstein hash function

Why use an enum instead of strings?

Because HashType.FNV1A is self-documenting and type-safe. You can't accidentally pass "FNV1a" (wrong case) or "FNV-1a" (wrong format). The compiler catches invalid types at compile time.

Interfaces

HashStrategy

Defines the contract for hash functions.

Scroll
MethodReturn TypeDescription
hash(data)longComputes a hash value for the input string

The interface is intentionally minimal. It takes a string and returns a long. The Bloom Filter will take the absolute value and modulo by the bit array size to get the actual index.

Strategy Implementations

FNV1aHashStrategy

Implements the FNV-1a 64-bit hash algorithm.

Scroll
AttributeTypeDescription
FNV_PRIMElong (static)FNV prime: 0x100000001b3
FNV_OFFSET_BASISlong (static)FNV offset: 0xcbf29ce484222325

FNV-1a is fast, simple, and has good distribution for small strings. The algorithm XORs each byte with the hash, then multiplies by the prime.

DJB2HashStrategy

Implements the DJB2 hash algorithm.

DJB2 uses the formula hash = hash * 33 + c for each character. It's known for simplicity and relatively low collision rates.

Factory

HashStrategyFactory

Creates hash strategy instances from enum types.

Scroll
MethodReturn TypeDescription
create(type)HashStrategyCreates a strategy instance for the given type

The factory decouples strategy instantiation from client code. Clients don't need to know class names like FNV1aHashStrategy. They just pass HashType.FNV1A.

Core Class

BloomFilter

This is the main class that exposes the public API (add and mightContain) and orchestrates the behavior of the internal components.

Scroll
AttributeTypeDescription
bitSetBitSetThe bit array storing element presence
bitSetSizeintSize of the bit array (m)
numHashFunctionsintNumber of hash functions to use (k)
hashStrategiesList\<HashStrategy\>Hash function implementations
MethodDescription
add(item)Adds an element by setting k bits
mightContain(item)Returns true if element might be present

BloomFilter.Builder

Provides fluent configuration.

The Builder validates all parameters and ensures no invalid BloomFilter is ever created.

3.2 Class Relationships

Understanding how these classes interact clarifies system behavior and responsibilities.

Composition

  • BloomFilter owns BitSet: The BloomFilter creates and manages the bit array. When the filter is garbage collected, the BitSet goes with it.

Association

  • BloomFilter uses HashStrategy: The filter uses strategies but doesn't own them. The same strategy could theoretically be shared across filters.

Implementation

  • FNV1aHashStrategy, DJB2HashStrategy implement HashStrategy: Both provide the same contract but with different algorithms.

3.3 Key Design Patterns

We apply a few classic design principles and patterns to keep the system modular, flexible, and testable.

Strategy Pattern (Hash Functions)

The Problem: A Bloom Filter needs multiple hash functions, and different use cases might prefer different algorithms. If we hardcode hash logic directly in the BloomFilter, we can't swap algorithms without modifying the filter class. Adding a new hash function would require changing existing code.

The Solution: The Strategy pattern encapsulates each hash algorithm in its own class. The BloomFilter holds a list of HashStrategy implementations and uses them interchangeably.

Why This Pattern: We could use a simple switch statement, but the Strategy pattern gives us:

  • Testability: Each hash function can be unit tested in isolation
  • Extensibility: Adding new hash functions means adding a new class, not modifying existing code
  • Single Responsibility: Each strategy handles exactly one hash algorithm

Factory Pattern (HashStrategyFactory)

The Problem: Creating hash strategies requires knowing concrete class names like FNV1aHashStrategy. Clients shouldn't need to know these implementation details. If we rename a class or change its package, all client code breaks.

The Solution: The Factory pattern provides a single creation point. Clients pass an enum value, and the factory returns the appropriate implementation.

Why This Pattern: A factory is simpler than full dependency injection but still provides decoupling. It's appropriate here because:

  • Hash strategy creation is simple (no complex dependencies)
  • We have a fixed, known set of implementations
  • Clients benefit from not knowing concrete class names

Builder Pattern

The Problem: Creating a BloomFilter requires several parameters that must be coordinated: bit array size, number of hash functions, and the actual strategies. A constructor with all these parameters would be hard to read and easy to misuse.

The Solution: The Builder pattern provides a fluent API for step-by-step configuration. The build() method validates everything before creating the filter.

Why This Pattern: The Builder is appropriate when:

  • Object construction is complex (multiple parameters)
  • Parameters have validation rules that span multiple fields
  • You want clear, self-documenting construction code

3.4 Full Class Diagram

4. Code Implementation

Now let's translate our design into working code. We'll build bottom-up: foundational types first, then interfaces, then implementations, then the main class. This order matters because each layer depends on the ones below it.

4.1 Enums

We start with the enum that other classes depend on.

Simple and clean. Just two values representing our supported hash algorithms.

4.2 Interfaces

Now we define the contract that our hash function classes will implement.

The interface is minimal by design. A hash function takes a string and returns a long. The BloomFilter handles converting this to a bit array index. This separation of concerns keeps the strategies focused on hashing only.

4.3 Strategy Implementations

Each strategy encapsulates a specific hash algorithm.

The FNV-1a algorithm is elegant in its simplicity. For each byte, we XOR it with the current hash, then multiply by the prime. The magic numbers come from the FNV specification and are chosen to provide good distribution.

DJB2 uses the classic formula hash * 33 + c. The bit shift (hash << 5) + hash is an efficient way to compute hash * 33. This algorithm is known for its simplicity and decent performance.

Each strategy is independently testable. You can unit test FNV1aHashStrategy without creating a BloomFilter. Just create the strategy, call hash(), and verify the output.

4.4 Factory

The factory decouples strategy creation from client code.

The factory is a static method because it has no state. Clients call HashStrategyFactory.create(HashType.FNV1A) without needing to instantiate the factory.

The default case might seem unnecessary since we handle all enum values. It's defensive programming, a safety net if someone adds a new enum value but forgets to update the factory.

4.5 BloomFilter Class

This is the main class where everything comes together. Let's build it piece by piece.

The constructor is private. All BloomFilter instances must be created through the Builder. This ensures validation always happens.

Notice all fields are final. Once a BloomFilter is created, its configuration never changes. The only mutable state is the bit array itself, and we control access to that through synchronized methods.

The add method is synchronized for thread safety. Multiple threads can call add concurrently without corrupting the bit array.

For each hash function, we:

  1. Compute the hash value
  2. Take the absolute value (to handle negative hashes)
  3. Modulo by bit array size to get an index in range
  4. Set that bit to true

The mightContain method uses the same index calculation. If any bit is false, the element was definitely never added (no false negatives). If all bits are true, the element might have been added, or we might have a false positive from hash collisions.

Now let's look at the Builder:

Each with* method validates its input and returns this for chaining. The build method performs cross-field validation before constructing the filter.

The critical validation is strategies.size() < numHashFunctions. If you configure 5 hash functions but only provide 2 strategies, that's a bug. We catch it at build time, not when add throws an IndexOutOfBoundsException.

4.6 Demo Class

Let's see the complete system in action.

Add Operation Sequence Diagram

The following diagram illustrates what happens when an element is added:

5. Run and Test

Loading editor...

6. Quiz

Design Bloom Filter Quiz

1 / 20
Multiple Choice

What is the primary reason for using a bit array in the design of a Bloom Filter?