Hash MapsLesson 3

Hash Sets

A hash map without values — just keys. O(1) membership testing, deduplication, and set operations (union, intersection, difference).

What Is It?

A hash set is like a hash map, but simpler. Instead of storing key-value pairs, it only stores keys. The only question it answers is: "Is this thing in my collection?"

Three operations: ADD something, CHECK if something is there, REMOVE something. All fast.

A set automatically ignores duplicates. If you add the same value twice, it only gets stored once. That makes sets the perfect tool for deduplication (removing duplicates from a list) and membership testing (checking if something exists).

Internally a hash set is just a hash map where every value is simply true. All the hashing, chaining, and resizing work exactly the same way.

Analogy

The Nightclub Guest List

Imagine a bouncer at a nightclub with a clipboard. The clipboard has just names — no other information. When someone arrives, the bouncer checks the list. Name on it? They get in. Name not on it? They don't.

Adding a name is instant. Checking a name is instant. Crossing off a name is instant.

Now two nightclubs merge their lists for a joint event:

  • Union: Everyone on either list gets in. If a name appears on both lists, it's counted once.
  • Intersection: Only people on BOTH lists. The VIPs with access everywhere.
  • Difference: People on Club A's list but NOT Club B's.

That's a hash set.

How It Works

Implementation: A HashMap Without Values

A hash set just wraps a hash map. Instead of key-value pairs, it stores key-true pairs:

pseudocode
1
2
3
4
5
6
7
8
9
10
11
12
13
14
structure HashSet:
map = new HashMap()
function add(element):
map.put(element, true)
function contains(element):
return map.get(element) is not null
function remove(element):
map.delete(element)
function size():
return map.size

Every operation passes through to the underlying hash map. All the hard work is already done.

ADD
O(n)worst
CONTAINS
O(n)worst
REMOVE
O(n)worst

Why Use a Set Instead of a List?

Suppose you want to track which users have visited your site today. No duplicates allowed:

pseudocode
1
2
3
4
5
6
7
8
Using a list:
visited = []
if user not in visited: // checks every item — slow!
visited.append(user)
Using a set:
visited = new HashSet()
visited.add(user) // instant, duplicates ignored automatically

With a list, checking for duplicates means scanning the whole list every time. With a set, it is one hash computation. For a million users, that difference is enormous.

Deduplication

The most common use of a set is removing duplicates from a list:

pseudocode
1
2
3
4
5
6
7
8
function removeDuplicates(array):
seen = new HashSet()
result = []
for each element in array:
if not seen.contains(element):
seen.add(element)
result.append(element)
return result

This runs through the list once — fast. Without a set, you'd need to compare every element against every previous element, which is much slower.

Set Operations

Sets have four classic operations for combining and comparing collections.

UNION: Everything from both sets

pseudocode
1
2
3
4
5
6
7
8
9
10
11
function union(setA, setB):
result = new HashSet()
for each element in setA:
result.add(element)
for each element in setB:
result.add(element) // duplicates automatically ignored
return result
A = {1, 2, 3, 4}
B = {3, 4, 5, 6}
UNION(A, B) = {1, 2, 3, 4, 5, 6}

INTERSECTION: Only what both sets share

pseudocode
1
2
3
4
5
6
7
8
9
10
function intersection(setA, setB):
result = new HashSet()
for each element in setA:
if setB.contains(element):
result.add(element)
return result
A = {1, 2, 3, 4}
B = {3, 4, 5, 6}
INTERSECTION(A, B) = {3, 4}

Tip: loop over the smaller set and check against the larger one — fewer iterations.

DIFFERENCE: What A has that B does not

pseudocode
1
2
3
4
5
6
7
8
9
10
11
function difference(setA, setB):
result = new HashSet()
for each element in setA:
if not setB.contains(element):
result.add(element)
return result
A = {1, 2, 3, 4}
B = {3, 4, 5, 6}
DIFFERENCE(A, B) = {1, 2}
DIFFERENCE(B, A) = {5, 6} // order matters — A minus B is not the same as B minus A

SYMMETRIC DIFFERENCE: Everything in either set, but not in both

pseudocode
1
2
3
4
5
6
7
8
9
10
11
12
13
function symmetricDifference(setA, setB):
result = new HashSet()
for each element in setA:
if not setB.contains(element):
result.add(element)
for each element in setB:
if not setA.contains(element):
result.add(element)
return result
A = {1, 2, 3, 4}
B = {3, 4, 5, 6}
SYMMETRIC_DIFFERENCE(A, B) = {1, 2, 5, 6}

IS_SUBSET: Is every element of A also in B?

pseudocode
1
2
3
4
5
function isSubset(setA, setB):
for each element in setA:
if not setB.contains(element):
return false
return true

Where Sets Show Up in Real Problems

  • Remove duplicates from a list
  • Check if a word exists in a dictionary
  • Track visited nodes in graph traversal (BFS/DFS)
  • Find common friends between two users (intersection)
  • Two-sum problem: check if the complement of a number exists

Examples

Example 1: Deduplication in action

pseudocode
1
2
3
4
5
6
7
8
9
10
11
12
13
14
input = [3, 1, 4, 1, 5, 9, 2, 6, 5, 3, 5]
Process each element:
add 3 seen: {3}
add 1 seen: {3, 1}
add 4 seen: {3, 1, 4}
skip 1 already in set
add 5 seen: {3, 1, 4, 5}
add 9 seen: {3, 1, 4, 5, 9}
add 2 seen: {3, 1, 4, 5, 9, 2}
add 6 seen: {3, 1, 4, 5, 9, 2, 6}
skip 5, 3, 5 already in set
result = [3, 1, 4, 5, 9, 2, 6] // duplicates removed, original order kept

Example 2: Two-sum using a set

Does any pair of numbers in the array add up to the target?

pseudocode
1
2
3
4
5
6
7
8
9
10
11
12
function hasTwoSum(array, target):
seen = new HashSet()
for each num in array:
complement = target - num
if seen.contains(complement):
return true
seen.add(num)
return false
array = [2, 7, 11, 15], target = 9
num=2: complement=7, seen={} -> no match, add 2
num=7: complement=2, seen={2} -> match! return true

Example 3: Finding elements common to three arrays

pseudocode
1
2
3
4
5
6
7
8
9
10
11
function commonInAll(arr1, arr2, arr3):
set1 = new HashSet(arr1)
set2 = new HashSet(arr2)
set3 = new HashSet(arr3)
return intersection(intersection(set1, set2), set3)
arr1 = [1, 5, 10, 20, 40, 80]
arr2 = [6, 7, 20, 80, 100]
arr3 = [3, 4, 15, 20, 30, 80]
result = {20, 80}

Common Mistakes

1. Using a list when you should use a set. If you are checking membership with if element in list, every check scans the whole list. Switch to a set for instant lookups.

2. Expecting a set to preserve order. A basic hash set does NOT guarantee any particular iteration order. Elements may come out in a different order than they went in. If you need ordered iteration, you need a different structure.

3. Confusing difference and symmetric difference. A minus B gives what A has that B does not. Symmetric difference gives what either has that the other does not. They are different things.

4. Using a set to count things. Adding an element that already exists does nothing — no error, no count update. If you need to count how many times each element appears, use a hash map (element -> count), not a set.

Best Practices

Use a set whenever you need fast membership testing. That is its primary job.

For deduplication, a set is faster than sorting. One pass through the list vs. sorting the whole thing.

When computing intersection, loop over the smaller set. This minimizes the number of contains checks.

If you need both membership testing AND a value per key, just use a hash map. No need to maintain both a set and a map separately.