Hash MapsLesson 2

Building a Hash Map

Build a hash map from scratch — the underlying array, handling collisions with chaining, resizing when it gets full, and the complete put/get/delete operations.

What Is It?

Now that you understand hashing, let's build the actual data structure. A hash map stores key-value pairs. You put in a key and a value, and later you can get that value back instantly using the key. Three core operations: PUT, GET, and DELETE — all fast on average.

Inside, a hash map is just an array of buckets. Each bucket holds a small linked list of key-value pairs. This technique is called chaining.

When you PUT a key-value pair, the hash function tells you which bucket to use. If nothing is in that bucket yet, you start a new list. If another key is already there (a collision), you add to that list.

That's the whole thing: an array where each slot holds a linked list, plus a hash function to pick the slot.

Analogy

The Apartment Building

Picture an apartment building with 8 floors. Each floor has a row of mailboxes hooked together on the wall.

When a new tenant moves in, the building manager runs their name through a formula that gives a floor number. The tenant's mailbox gets added to that floor's row, next to any boxes already there.

To deliver mail to "Alice", the mail carrier calculates Alice's floor (say, floor 3), goes straight to floor 3, and walks down the row checking name tags until she finds Alice's box. If floor 3 only has 2 or 3 boxes, this is nearly instant. But if 50 tenants all got assigned to floor 3 (bad formula!), the carrier walks a very long row.

When the building gets too crowded — too many tenants per floor — the manager builds a new building with 16 floors and moves everyone. Each tenant gets a new floor number based on the new building size. This is resizing.

How It Works

The Node and Bucket Structure

Each entry is a small node with a key, a value, and a pointer to the next node in the chain:

pseudocode
1
2
3
4
structure Node:
key
value
next = null

The hash map holds an array of these chains, plus a count and a capacity:

pseudocode
1
2
3
4
structure HashMap:
capacity = 8
size = 0
buckets = array of size capacity, all null

Here is what a hash map with 8 buckets and 5 entries looks like:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
buckets
┌─────────┐
│ 0: null │
├─────────┤
│ 1: ─────┼──> ["bob": 25] -> ["eve": 30] -> null
├─────────┤
│ 2: ─────┼──> ["alice": 22] -> null
├─────────┤
│ 3: null │
├─────────┤
│ 4: null │
├─────────┤
│ 5: ─────┼──> ["carol": 28] -> null
├─────────┤
│ 6: null │
├─────────┤
│ 7: ─────┼──> ["dave": 35] -> null
└─────────┘

Bob and Eve both hashed to bucket 1, so they share a chain.

PUT(key, value)

Insert a new key-value pair, or update the value if the key already exists:

pseudocode
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
function put(key, value):
index = hash(key)
node = buckets[index]
// Walk the chain — maybe the key already exists
while node is not null:
if node.key == key:
node.value = value // update existing key
return
node = node.next
// Key not found — add a new node at the front of the chain
newNode = new Node(key, value)
newNode.next = buckets[index]
buckets[index] = newNode
size = size + 1
// If too full, resize
if size / capacity > 0.75:
resize()

Why add at the front instead of the end? Adding at the front is instant — just point the new node to the current head and update the bucket. Adding at the end would require walking to the last node first.

GET(key)

Find and return the value for a given key:

pseudocode
1
2
3
4
5
6
7
8
9
10
function get(key):
index = hash(key)
node = buckets[index]
while node is not null:
if node.key == key:
return node.value
node = node.next
return null // key not found

DELETE(key)

Remove a key-value pair:

pseudocode
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
function delete(key):
index = hash(key)
node = buckets[index]
prev = null
while node is not null:
if node.key == key:
if prev is null:
buckets[index] = node.next // removing the first node
else:
prev.next = node.next // skip over the node
size = size - 1
return true
prev = node
node = node.next
return false // key not found

The prev pointer lets us re-link the chain around the node we are removing.

RESIZE

When the table gets too full (more than 75% capacity), double the size and rehash everything:

pseudocode
1
2
3
4
5
6
7
8
9
10
11
function resize():
oldBuckets = buckets
capacity = capacity * 2
buckets = array of size capacity, all null
size = 0
for i = 0 to length(oldBuckets) - 1:
node = oldBuckets[i]
while node is not null:
put(node.key, node.value) // rehash into new buckets
node = node.next

Why do we have to rehash everything? Because the index for each key depends on capacity. When capacity changes, every key might land in a different bucket.

pseudocode
1
2
3
4
5
6
7
8
9
BEFORE RESIZE (capacity = 4):
hash("alice") mod 4 = 2
hash("bob") mod 4 = 3
hash("carol") mod 4 = 2
AFTER RESIZE (capacity = 8):
hash("alice") mod 8 = 2
hash("bob") mod 8 = 7
hash("carol") mod 8 = 6

Bob moved from bucket 3 to bucket 7. Carol moved from bucket 2 to bucket 6. Resizing is not just adding empty buckets — it redistributes everyone.

Time Complexity

PUT
O(n)worst
GET
O(n)worst
DELETE
O(n)worst
RESIZE
O(n)worst

The average case assumes a decent hash function and a load factor below 0.75. The worst case happens when every single key lands in the same bucket — then you're just scanning one long list.

Resize is slow (it touches every entry), but it happens rarely. Because we double the size each time, the cost of resize spreads out to roughly one extra step per insertion on average.

Examples

Example 1: Step-by-step PUT operations

pseudocode
1
2
3
4
5
6
7
8
9
10
11
12
HashMap with capacity 4:
PUT("alice", 22): hash % 4 = 2 -> bucket 2: [alice:22]
PUT("bob", 25): hash % 4 = 1 -> bucket 1: [bob:25]
PUT("carol", 28): hash % 4 = 2 -> bucket 2: [carol:28]->[alice:22]
size=3, capacity=4, load factor = 3/4 = 0.75 -> RESIZE triggered!
After resize (capacity 8):
alice -> bucket 2
bob -> bucket 5
carol -> bucket 6
(the collision between alice and carol is gone!)

Example 2: GET with chain walking

pseudocode
1
2
3
4
5
6
7
8
9
10
11
12
Bucket 3: ["eve":30] -> ["frank":40] -> ["grace":50] -> null
GET("grace"):
hash to bucket 3
Check "eve" not a match
Check "frank" not a match
Check "grace" match! return 50
GET("zara"):
hash to bucket 3
Check "eve", "frank", "grace" no match
return null

Example 3: DELETE from middle of chain

pseudocode
1
2
3
4
5
6
7
8
Bucket 2: ["carol":28] -> ["alice":22] -> ["ted":19] -> null
DELETE("alice"):
node = "carol" (prev = null) not a match
node = "alice" (prev = "carol") match!
prev.next = node.next
Bucket 2: ["carol":28] -> ["ted":19] -> null

Common Mistakes

1. Always adding without checking for existing keys. If you call PUT("alice", 22) and then PUT("alice", 30) and you always just prepend, you end up with two entries for "alice". GET finds the first one and your size count is wrong. Always walk the chain first to check.

2. Losing track of the next pointer during resize. Inside the resize loop, you call put() which modifies the new buckets. Make sure you save node.next before the put() call, or you will lose your place in the old chain.

3. Never resizing. A table that never grows will degrade as entries pile up. At load factor 10, the average chain has 10 nodes — every operation is 10x slower.

4. Forgetting the special case in DELETE where the node is first in the chain. If prev is null, you cannot do prev.next = node.next. You have to update buckets[index] directly.

Best Practices

Resize at 0.75 load factor. This is the standard threshold. It balances memory use and collision frequency well.

Add new nodes at the front of the chain, not the end. It is instant — no need to walk to the tail.

Track size separately. Keep an explicit size variable. Counting chain lengths every time would require scanning the whole table.

Test edge cases: empty map, single entry, deleting from a one-item chain, updating an existing key.