Question

Design a memory efficient Bloom Filter that supports `add(element)` and `mightContain(element)` operations.

A Bloom Filter is a probabilistic data structure used to test whether an element is a member of a set. It is designed to be very fast and extremely space-efficient, especially when working with large volumes of data where memory is constrained.

However, it comes with one trade-off:

It may produce false positives (saying an element is in the set when it actually isn’t).
But it never produces false negatives (if it says an element is not present, it’s guaranteed to be absent).

In this chapter, we will explore the low-level design of bloom filter in detail.

Lets start by clarifying the requirements:

1. Clarifying Requirements

Before starting the design, it's important to ask thoughtful questions to uncover hidden assumptions, clarify ambiguities, and define the system's scope more precisely.

Here is an example of how a discussion between the candidate and the interviewer might unfold:

Discussion

Candidate: Should the Bloom Filter support generic types (e.g., any object), or should we limit it to specific data types like strings or integers

Interviewer: Let’s assume for this design that we’re working with strings.

Candidate: What operations should the Bloom Filter support? Just add and mightContain, or do we also need to support deletion

Interviewer: Just add and mightContain. Deletion is not required in a standard Bloom Filter.

Candidate: What kind of false result is acceptable? Are we okay with false positives?

Interviewer: Yes, that’s expected. A Bloom Filter can return false positives, but it should never return false negatives.

Candidate: Should the number of hash functions (k) and size of the bit array (m) be configurable or fixed?

Interviewer: Ideally, they should be configurable at initialization. The values may vary depending on the expected number of elements and acceptable false positive rate.

Candidate: What should be the behavior if the same element is added multiple times?

Interviewer: That’s fine. Bloom Filters are idempotent. Adding the same element again should have no effect on correctness.

Candidate: Should the design support concurrency? For example, multiple threads calling add and mightContain?

Interviewer: Yes. Assume the Bloom Filter will be accessed from a multi-threaded environment.

Based on the discussion, here’s a summary of the functional requirements:

1.1 Functional Requirements

Support add(element): Add an element to the set.
Support mightContain(element): Check if an element might be in the set.
The filter should operate on strings as the input data type.
Support configurable values of k (number of hash functions) and m (size of the bit array).

1.2 Non-Functional Requirements

Space Efficiency: The Bloom Filter should use significantly less memory than storing the actual elements
High Performance: Both add and mightContain operations should be optimized for speed, ideally executing in O(1) time.
Thread-Safety: The implementation should be safe to use in a concurrent environment.

2. Identifying Core Entities

Premium Content

This content is for premium members only.

← Previous: Design LRU Cache Next: Design Task Management Sy... →

Design Bloom Filter

Ashish Pratap Singh

Design a memory efficient Bloom Filter that supports `add(element)` and `mightContain(element)` operations.

1. Clarifying Requirements

1.1 Functional Requirements

1.2 Non-Functional Requirements

2. Identifying Core Entities

Premium Content

Get Premium

Design Bloom Filter

Ashish Pratap Singh

Design a memory efficient Bloom Filter that supports add(element) and mightContain(element) operations.

1. Clarifying Requirements

1.1 Functional Requirements

1.2 Non-Functional Requirements

2. Identifying Core Entities

Premium Content

Get Premium

Design a memory efficient Bloom Filter that supports `add(element)` and `mightContain(element)` operations.