Hash Sets
A hash map without values — just keys. O(1) membership testing, deduplication, and set operations (union, intersection, difference).
What Is It?
A hash set is like a hash map, but simpler. Instead of storing key-value pairs, it only stores keys. The only question it answers is: "Is this thing in my collection?"
Three operations: ADD something, CHECK if something is there, REMOVE something. All fast.
A set automatically ignores duplicates. If you add the same value twice, it only gets stored once. That makes sets the perfect tool for deduplication (removing duplicates from a list) and membership testing (checking if something exists).
Internally a hash set is just a hash map where every value is simply true. All the hashing, chaining, and resizing work exactly the same way.
Analogy
The Nightclub Guest List
Imagine a bouncer at a nightclub with a clipboard. The clipboard has just names — no other information. When someone arrives, the bouncer checks the list. Name on it? They get in. Name not on it? They don't.
Adding a name is instant. Checking a name is instant. Crossing off a name is instant.
Now two nightclubs merge their lists for a joint event:
- Union: Everyone on either list gets in. If a name appears on both lists, it's counted once.
- Intersection: Only people on BOTH lists. The VIPs with access everywhere.
- Difference: People on Club A's list but NOT Club B's.
That's a hash set.
How It Works
Implementation: A HashMap Without Values
A hash set just wraps a hash map. Instead of key-value pairs, it stores key-true pairs:
structure HashSet:map = new HashMap()function add(element):map.put(element, true)function contains(element):return map.get(element) is not nullfunction remove(element):map.delete(element)function size():return map.size
Every operation passes through to the underlying hash map. All the hard work is already done.
Why Use a Set Instead of a List?
Suppose you want to track which users have visited your site today. No duplicates allowed:
Using a list:visited = []if user not in visited: // checks every item — slow!visited.append(user)Using a set:visited = new HashSet()visited.add(user) // instant, duplicates ignored automatically
With a list, checking for duplicates means scanning the whole list every time. With a set, it is one hash computation. For a million users, that difference is enormous.
Deduplication
The most common use of a set is removing duplicates from a list:
function removeDuplicates(array):seen = new HashSet()result = []for each element in array:if not seen.contains(element):seen.add(element)result.append(element)return result
This runs through the list once — fast. Without a set, you'd need to compare every element against every previous element, which is much slower.
Set Operations
Sets have four classic operations for combining and comparing collections.
UNION: Everything from both sets
function union(setA, setB):result = new HashSet()for each element in setA:result.add(element)for each element in setB:result.add(element) // duplicates automatically ignoredreturn resultA = {1, 2, 3, 4}B = {3, 4, 5, 6}UNION(A, B) = {1, 2, 3, 4, 5, 6}
INTERSECTION: Only what both sets share
function intersection(setA, setB):result = new HashSet()for each element in setA:if setB.contains(element):result.add(element)return resultA = {1, 2, 3, 4}B = {3, 4, 5, 6}INTERSECTION(A, B) = {3, 4}
Tip: loop over the smaller set and check against the larger one — fewer iterations.
DIFFERENCE: What A has that B does not
function difference(setA, setB):result = new HashSet()for each element in setA:if not setB.contains(element):result.add(element)return resultA = {1, 2, 3, 4}B = {3, 4, 5, 6}DIFFERENCE(A, B) = {1, 2}DIFFERENCE(B, A) = {5, 6} // order matters — A minus B is not the same as B minus A
SYMMETRIC DIFFERENCE: Everything in either set, but not in both
function symmetricDifference(setA, setB):result = new HashSet()for each element in setA:if not setB.contains(element):result.add(element)for each element in setB:if not setA.contains(element):result.add(element)return resultA = {1, 2, 3, 4}B = {3, 4, 5, 6}SYMMETRIC_DIFFERENCE(A, B) = {1, 2, 5, 6}
IS_SUBSET: Is every element of A also in B?
function isSubset(setA, setB):for each element in setA:if not setB.contains(element):return falsereturn true
Where Sets Show Up in Real Problems
- Remove duplicates from a list
- Check if a word exists in a dictionary
- Track visited nodes in graph traversal (BFS/DFS)
- Find common friends between two users (intersection)
- Two-sum problem: check if the complement of a number exists
Examples
Example 1: Deduplication in action
input = [3, 1, 4, 1, 5, 9, 2, 6, 5, 3, 5]Process each element:add 3 — seen: {3}add 1 — seen: {3, 1}add 4 — seen: {3, 1, 4}skip 1 — already in setadd 5 — seen: {3, 1, 4, 5}add 9 — seen: {3, 1, 4, 5, 9}add 2 — seen: {3, 1, 4, 5, 9, 2}add 6 — seen: {3, 1, 4, 5, 9, 2, 6}skip 5, 3, 5 — already in setresult = [3, 1, 4, 5, 9, 2, 6] // duplicates removed, original order kept
Example 2: Two-sum using a set
Does any pair of numbers in the array add up to the target?
function hasTwoSum(array, target):seen = new HashSet()for each num in array:complement = target - numif seen.contains(complement):return trueseen.add(num)return falsearray = [2, 7, 11, 15], target = 9num=2: complement=7, seen={} -> no match, add 2num=7: complement=2, seen={2} -> match! return true
Example 3: Finding elements common to three arrays
function commonInAll(arr1, arr2, arr3):set1 = new HashSet(arr1)set2 = new HashSet(arr2)set3 = new HashSet(arr3)return intersection(intersection(set1, set2), set3)arr1 = [1, 5, 10, 20, 40, 80]arr2 = [6, 7, 20, 80, 100]arr3 = [3, 4, 15, 20, 30, 80]result = {20, 80}
Common Mistakes
1. Using a list when you should use a set. If you are checking membership with if element in list, every check scans the whole list. Switch to a set for instant lookups.
2. Expecting a set to preserve order. A basic hash set does NOT guarantee any particular iteration order. Elements may come out in a different order than they went in. If you need ordered iteration, you need a different structure.
3. Confusing difference and symmetric difference. A minus B gives what A has that B does not. Symmetric difference gives what either has that the other does not. They are different things.
4. Using a set to count things. Adding an element that already exists does nothing — no error, no count update. If you need to count how many times each element appears, use a hash map (element -> count), not a set.
Best Practices
Use a set whenever you need fast membership testing. That is its primary job.
For deduplication, a set is faster than sorting. One pass through the list vs. sorting the whole thing.
When computing intersection, loop over the smaller set. This minimizes the number of contains checks.
If you need both membership testing AND a value per key, just use a hash map. No need to maintain both a set and a map separately.