add(element)
and mightContain(element)
operations.A Bloom Filter is a probabilistic data structure used to test whether an element is a member of a set. It is designed to be very fast and extremely space-efficient, especially when working with large volumes of data where memory is constrained.
However, it comes with one trade-off:
In this chapter, we will explore the low-level design of bloom filter in detail.
Lets start by clarifying the requirements:
Before starting the design, it's important to ask thoughtful questions to uncover hidden assumptions, clarify ambiguities, and define the system's scope more precisely.
Here is an example of how a discussion between the candidate and the interviewer might unfold:
Candidate: Should the Bloom Filter support generic types (e.g., any object), or should we limit it to specific data types like strings or integers
Interviewer: Let’s assume for this design that we’re working with strings.
Candidate: What operations should the Bloom Filter support? Just add
and mightContain
, or do we also need to support deletion
Interviewer: Just add
and mightContain
. Deletion is not required in a standard Bloom Filter.
Candidate: What kind of false result is acceptable? Are we okay with false positives?
Interviewer: Yes, that’s expected. A Bloom Filter can return false positives, but it should never return false negatives.
Candidate: Should the number of hash functions (k
) and size of the bit array (m
) be configurable or fixed?
Interviewer: Ideally, they should be configurable at initialization. The values may vary depending on the expected number of elements and acceptable false positive rate.
Candidate: What should be the behavior if the same element is added multiple times?
Interviewer: That’s fine. Bloom Filters are idempotent. Adding the same element again should have no effect on correctness.
Candidate: Should the design support concurrency? For example, multiple threads calling add
and mightContain
?
Interviewer: Yes. Assume the Bloom Filter will be accessed from a multi-threaded environment.
Based on the discussion, here’s a summary of the functional requirements:
k
(number of hash functions) and m
(size of the bit array).add
and mightContain
operations should be optimized for speed, ideally executing in O(1) time.The design of a Bloom Filter is fundamentally different from typical object-oriented systems that model real-world entities. Instead of focusing on domain objects, we focus on choosing the right low-level data structures and abstractions that enable fast, memory-efficient, and thread-safe operations.
Our goal is to support two operations add(element)
and mightContain(element)
both in O(1) time, using minimal memory, while being safe for use in multi-threaded environments.
To meet these requirements, we must combine three key components:
Let’s break down these components and understand how they interact.
The first and most fundamental requirement is space efficiency. Since we are explicitly told not to store the actual elements, we need a compact representation of which elements have been seen.
This leads us to a bit array — a one-dimensional array where each element is a single bit (0 or 1).
k
bit positions, and each of those bits is set to 1.k
bits are examined. If all are 1, the element might be in the set. If any bit is 0, it is definitely not.In Java, we would typically use java.util.BitSet
for this purpose, as it provides a compact, memory-efficient implementation and utilities for bit manipulation.
To populate and check the bit array, we need a way to map each element to one or more bit positions. This is the role of our hash functions.
A standard Bloom Filter uses k
independent hash functions, each of which maps a string input to a bit array index between 0
and m - 1
.
add()
and mightContain()
operations.In our Bloom Filter, we will maintain an array or list of k
HashFunction
implementations, each producing a different hash value for the same input.
Finally, we need a public-facing class that ties everything together and exposes a clean API to the user. This is the BloomFilter
class.
It acts as the coordinator or facade, hiding the internal complexity of hashing and bit manipulation while providing two easy-to-use methods:
add(String element)
mightContain(String element): boolean
m
bits representing the filter’s state. Supports fast setting and checking of bits.k
independent hash functions used to map string elements to indices in the bit array.add()
and mightContain()
and manages internal data structures.Together, these core components allow us to build a Bloom Filter that is compact, and fast, all while delivering the required functionality and performance.
Now that we've identified the core components of a Bloom Filter, it's time to convert those ideas into a structured class design. This includes defining the responsibilities and attributes of each class, establishing relationships between them, applying design patterns for clean extensibility, and finally visualizing the system in a class diagram.
We begin by defining the key classes involved in the system, starting with data structures and utilities, and culminating in the main interface.
BloomFilter
This is the main class that exposes the public API (add
and mightContain
) and orchestrates the behavior of the internal components.
bitArray: BitSet
– The core data store that tracks presence via bits.hashFunctions: List<HashFunction>
– A list of k
independent hash functions.size: int
– The total size m
of the bit array.BloomFilter(int size, List<HashFunction> hashFunctions)
– Constructor to initialize size and hash functions.void add(String element)
– Applies all hash functions to the input and sets the corresponding bits.boolean mightContain(String element)
– Applies all hash functions and checks if all relevant bits are set.HashFunction
(Interface)Defines the contract for hash functions used by the Bloom Filter.
Method:
int hash(String input, int seed, int bound)
– Computes a hash of the input string, using a seed, and returns a result in the range [0, bound)
.This allows support for multiple independent hash functions by using different seeds.
Understanding how these classes interact clarifies system behavior and responsibilities.
BloomFilter
has-a BitSet
BloomFilter
has-a List<HashFunction>
The BloomFilter
owns and manages these components. Their lifecycle is tied to the lifecycle of the filter.
BloomFilter
uses HashFunction
to compute bit positions.HashFunction
is an abstraction that allows the filter to be configured with different implementations.We apply a few classic design principles and patterns to keep the system modular, flexible, and testable.
The HashFunction
interface and its implementations follow the Strategy Pattern. This allows us to inject different hashing strategies into the filter at runtime, based on the use case or performance characteristics.
DefaultHashFunction
with MurmurHashFunction
, Djb2Hash
without changing the BloomFilter
class.Creating a Bloom Filter requires complex calculations to determine the optimal bit array size (m) and number of hash functions (k) based on user-friendly inputs (expected elements and desired false positive rate). We provide a public static factory method BloomFilter.create(...) that encapsulates this complexity, offering a much cleaner API to the client than a constructor that requires m and k directly.
The BloomFilter
class acts as a facade, exposing a simplified interface (add
and mightContain
) while hiding all internal details related to hashing, bit manipulation, and concurrency handling.
This simplifies the usage for clients and makes the internal system easier to evolve without breaking external code.
HashType
EnumDefines the set of supported hash function types. This allows runtime flexibility in choosing hash algorithms and enables extensibility by adding new types later.
HashStrategy
Interface and ImplementationsDefines a contract for all hashing algorithms used in the Bloom Filter. Any strategy must implement a consistent way to hash strings into long integers.
HashStrategyFactory
Implements the Factory Pattern to provide the appropriate hashing strategy based on the enum value. Decouples strategy instantiation from client code.
BloomFilter
ClassThis is the core Bloom Filter class.
bitSet
: The binary array (bitmap) representing membership.numHashFunctions
: Controls how many different hashes are computed for each item.hashStrategies
: A list of HashStrategy
instances used to simulate multiple independent hash functions.add(String item)
Inserts an element into the Bloom filter by setting k
bits in the bit array. Each hash function produces a position; that bit is set to 1
.
mightContain(String item)
Checks for probable presence by verifying all bits that would have been set for the element.If any bit is not set, the element is definitely not present (no false negatives). If all bits are set, the item might be present (possible false positives).
Constructing a BloomFilter requires several parameters that must be set correctly. The Builder pattern provides a fluent, readable API for this configuration. It also allows us to perform validation within the build() method, ensuring that no BloomFilter object is ever created in an invalid state.
BloomFilterDemo
The BloomFilterDemo class demonstrates how a client would use the system, highlighting its core properties: the guarantee of no false negatives and the probability of false positives.
This driver code demonstrates a complete workflow: