add(element)
and mightContain(element)
operations.A Bloom Filter is a probabilistic data structure used to test whether an element is a member of a set. It is designed to be very fast and extremely space-efficient, especially when working with large volumes of data where memory is constrained.
However, it comes with one trade-off:
In this chapter, we will explore the low-level design of bloom filter in detail.
Lets start by clarifying the requirements:
Before starting the design, it's important to ask thoughtful questions to uncover hidden assumptions, clarify ambiguities, and define the system's scope more precisely.
Here is an example of how a discussion between the candidate and the interviewer might unfold:
Candidate: Should the Bloom Filter support generic types (e.g., any object), or should we limit it to specific data types like strings or integers
Interviewer: Let’s assume for this design that we’re working with strings.
Candidate: What operations should the Bloom Filter support? Just add
and mightContain
, or do we also need to support deletion
Interviewer: Just add
and mightContain
. Deletion is not required in a standard Bloom Filter.
Candidate: What kind of false result is acceptable? Are we okay with false positives?
Interviewer: Yes, that’s expected. A Bloom Filter can return false positives, but it should never return false negatives.
Candidate: Should the number of hash functions (k
) and size of the bit array (m
) be configurable or fixed?
Interviewer: Ideally, they should be configurable at initialization. The values may vary depending on the expected number of elements and acceptable false positive rate.
Candidate: What should be the behavior if the same element is added multiple times?
Interviewer: That’s fine. Bloom Filters are idempotent. Adding the same element again should have no effect on correctness.
Candidate: Should the design support concurrency? For example, multiple threads calling add
and mightContain
?
Interviewer: Yes. Assume the Bloom Filter will be accessed from a multi-threaded environment.
Based on the discussion, here’s a summary of the functional requirements:
k
(number of hash functions) and m
(size of the bit array).add
and mightContain
operations should be optimized for speed, ideally executing in O(1) time.