Last Updated: March 21, 2026
We need to build three operations: insert (or update), lookup, and delete. The core challenge is mapping arbitrary keys to storage locations efficiently.
The simplest idea is to just use an array where the index is the key. That works when the key range is small and known. But real hash maps handle arbitrary keys efficiently by using a hash function to compress keys into a smaller range of indices, then dealing with collisions when two keys map to the same index.
The key design decisions are: how big should the underlying array be, what hash function to use, and how to handle collisions. These choices determine the trade-off between memory usage and operation speed.
0 <= key, value <= 10^6 → Keys are non-negative integers up to one million. This is small enough that a direct-address array is technically feasible, but wasteful.10^4 calls → With only 10,000 operations, even O(n) per operation would pass. But we should still design for efficiency since this is a design problem testing your understanding of hash tables.The problem tells us keys range from 0 to 10^6. The most straightforward approach is to allocate an array large enough to hold every possible key, and use the key itself as the index. No hashing, no collisions, no complexity.
This is called a direct-address table. It's the simplest possible mapping: array[key] = value. We initialize every slot to -1 (indicating "no mapping"), and put/get/remove become direct array accesses.
put(key, value): Set array[key] = valueget(key): Return array[key] (returns -1 if never set)remove(key): Set array[key] = -1This approach works but wastes ~4MB of memory even if we only store a handful of entries, and it doesn't generalize to larger key ranges. What if we used a smaller array and a hash function to map keys into it?
Instead of an array that covers every possible key, we use a much smaller array (say, size 769) and a hash function to map each key to a bucket index: bucket = key % 769. Multiple keys can map to the same bucket, and that's called a collision.
To handle collisions, each bucket stores a linked list of (key, value) pairs. When we put a new key, we hash it to find the bucket, then walk the list to either update an existing entry or append a new one. Get and remove work the same way: hash to find the bucket, then search the list.
Why 769? It's a prime number, which helps distribute keys more evenly across buckets. A power of 2 (like 1024) would cause keys that are multiples of the same factor to cluster in the same buckets. Primes reduce this clustering effect.
This is exactly how java.util.HashMap, Python's dict, and most real-world hash tables work at their core (though production implementations add dynamic resizing, better hash functions, and tree-based buckets for degenerate cases).
The hash function distributes keys across buckets roughly uniformly. With 769 buckets and at most 10^4 entries, the average chain length is about 10000/769 ≈ 13 entries. Walking a chain of 13 nodes is extremely fast.
The choice of a prime bucket size matters. If we used, say, 1000 buckets, then keys 0, 1000, 2000, 3000 would all land in bucket 0. With a prime like 769, this kind of systematic clustering is much less likely because common multiples don't align with the bucket count.
hash(key) = key % 769put(key, value):bucket = hash(key)bucketget(key):bucket = hash(key)bucketremove(key):bucket = hash(key)bucketChaining works well, but linked lists have poor cache locality since each node is allocated separately on the heap. What if we stored everything in a contiguous array and handled collisions by probing for the next open slot?
Instead of using linked lists to handle collisions, open addressing stores all entries directly in the array. When a collision happens (the target slot is already occupied by a different key), we simply check the next slot, then the next, and so on until we find an empty one. This is called linear probing.
The tricky part is deletion. We can't just set a slot to "empty" when we remove a key, because that would break the probe chain. If key A is at index 5 and key B (which collided with A) is at index 6, removing A and marking index 5 as empty means we'd stop searching at index 5 when looking for B, and never find it. The solution is to use a tombstone marker that says "this slot is empty but the chain continues."
We need a larger array than the chaining approach to keep the load factor low. With a load factor below 0.7 or so, linear probing performs well. Since we have at most 10^4 entries, an array of size 20011 (a prime number roughly double the max entries) gives a comfortable load factor.
Linear probing keeps all data in a contiguous array, which is cache-friendly. When we probe for the next slot, it's likely in the same cache line as the current slot, making traversal fast on modern CPUs.
The tombstone mechanism preserves probe chains. When we remove key A and later search for key B (which was placed after A due to a collision), we skip over the tombstone at A's old position and keep searching. Without tombstones, we'd stop at A's slot and falsely conclude that B doesn't exist.
The load factor (number of entries / array size) determines performance. With 10^4 entries in a 20011-slot array, the load factor is about 0.5, which keeps the average probe length very short (around 1.5 probes per lookup).
hash(key) = key % 20011put(key, value):index = hash(key)get(key):index = hash(key)remove(key):index = hash(key)