AlgoMaster Logo

Design HashMap

Last Updated: March 21, 2026

easy
5 min read

Understanding the Problem

We need to build three operations: insert (or update), lookup, and delete. The core challenge is mapping arbitrary keys to storage locations efficiently.

The simplest idea is to just use an array where the index is the key. That works when the key range is small and known. But real hash maps handle arbitrary keys efficiently by using a hash function to compress keys into a smaller range of indices, then dealing with collisions when two keys map to the same index.

The key design decisions are: how big should the underlying array be, what hash function to use, and how to handle collisions. These choices determine the trade-off between memory usage and operation speed.

Key Constraints:

  • 0 <= key, value <= 10^6 → Keys are non-negative integers up to one million. This is small enough that a direct-address array is technically feasible, but wasteful.
  • At most 10^4 calls → With only 10,000 operations, even O(n) per operation would pass. But we should still design for efficiency since this is a design problem testing your understanding of hash tables.
  • The whole point of this problem is to implement an efficient hash map, so we should aim for O(1) average per operation.

Approach 1: Brute Force (Direct Address Array)

Intuition

The problem tells us keys range from 0 to 10^6. The most straightforward approach is to allocate an array large enough to hold every possible key, and use the key itself as the index. No hashing, no collisions, no complexity.

This is called a direct-address table. It's the simplest possible mapping: array[key] = value. We initialize every slot to -1 (indicating "no mapping"), and put/get/remove become direct array accesses.

Algorithm

  1. Create an integer array of size 10^6 + 1, initialized to -1
  2. put(key, value): Set array[key] = value
  3. get(key): Return array[key] (returns -1 if never set)
  4. remove(key): Set array[key] = -1

Code

This approach works but wastes ~4MB of memory even if we only store a handful of entries, and it doesn't generalize to larger key ranges. What if we used a smaller array and a hash function to map keys into it?

Approach 2: Array of Linked Lists (Chaining)

Intuition

Instead of an array that covers every possible key, we use a much smaller array (say, size 769) and a hash function to map each key to a bucket index: bucket = key % 769. Multiple keys can map to the same bucket, and that's called a collision.

To handle collisions, each bucket stores a linked list of (key, value) pairs. When we put a new key, we hash it to find the bucket, then walk the list to either update an existing entry or append a new one. Get and remove work the same way: hash to find the bucket, then search the list.

Why 769? It's a prime number, which helps distribute keys more evenly across buckets. A power of 2 (like 1024) would cause keys that are multiples of the same factor to cluster in the same buckets. Primes reduce this clustering effect.

This is exactly how java.util.HashMap, Python's dict, and most real-world hash tables work at their core (though production implementations add dynamic resizing, better hash functions, and tree-based buckets for degenerate cases).

Algorithm

  1. Create an array of size 769, where each slot holds the head of a linked list
  2. Define a hash function: hash(key) = key % 769
  3. put(key, value):
    • Compute bucket = hash(key)
    • Walk the linked list at bucket
    • If a node with this key exists, update its value
    • Otherwise, insert a new node at the head of the list
  4. get(key):
    • Compute bucket = hash(key)
    • Walk the linked list at bucket
    • If a node with this key exists, return its value
    • Otherwise, return -1
  5. remove(key):
    • Compute bucket = hash(key)
    • Walk the linked list at bucket
    • If a node with this key exists, remove it from the list

Code

Chaining works well, but linked lists have poor cache locality since each node is allocated separately on the heap. What if we stored everything in a contiguous array and handled collisions by probing for the next open slot?

Approach 3: Open Addressing (Linear Probing)

Intuition

Instead of using linked lists to handle collisions, open addressing stores all entries directly in the array. When a collision happens (the target slot is already occupied by a different key), we simply check the next slot, then the next, and so on until we find an empty one. This is called linear probing.

The tricky part is deletion. We can't just set a slot to "empty" when we remove a key, because that would break the probe chain. If key A is at index 5 and key B (which collided with A) is at index 6, removing A and marking index 5 as empty means we'd stop searching at index 5 when looking for B, and never find it. The solution is to use a tombstone marker that says "this slot is empty but the chain continues."

We need a larger array than the chaining approach to keep the load factor low. With a load factor below 0.7 or so, linear probing performs well. Since we have at most 10^4 entries, an array of size 20011 (a prime number roughly double the max entries) gives a comfortable load factor.

Algorithm

  1. Create an array of size 20011, with each slot initialized to a sentinel "empty" state
  2. Define a hash function: hash(key) = key % 20011
  3. put(key, value):
    • Start at index = hash(key)
    • Probe forward until we find the key (update it), a tombstone (reuse the slot), or an empty slot (insert here)
  4. get(key):
    • Start at index = hash(key)
    • Probe forward until we find the key (return value) or an empty slot (key doesn't exist)
    • Skip over tombstones during the search
  5. remove(key):
    • Start at index = hash(key)
    • Probe forward until we find the key
    • Mark the slot as a tombstone

Code