HashSet<T> is the .NET collection for storing a group of unique values with fast membership tests. It uses a hash table internally, so checking whether an item is already in the set, adding a new item, or removing one all take constant time on average. The trade-off is that the set doesn't preserve any order and won't accept duplicates: a second Add of the same value is silently rejected.

Why a Set Instead of a List

A List<T> is the right shape when you care about order, you might have duplicates, or you need positional access by index. None of that is free. Asking "is this email already in the list?" forces the runtime to scan every entry until it finds a match or runs off the end. For a cart of five items it doesn't matter. For ten thousand customer emails it does.

HashSet<T> flips the priority. It throws away ordering and indexing in exchange for O(1) average-case Contains, Add, and Remove. The first time you build a deduplication step on a long list and your O(n^2) loop turns into O(n), the cost picture becomes obvious.

Two distinct emails went in, the duplicate ada@shop.com was silently ignored, and Contains answered both questions in constant time. There is no index, no seenEmails[0], no "the first email I added." The set knows what's inside and nothing more.

The mental model is a bag with a bouncer at the door. The bouncer checks every value against everything already inside, and refuses anything that's already there. After everything is in, the bag will happily answer "is X in here?" but it can't tell you what order people arrived.

Cost: HashSet<T> trades a small per-item memory overhead (the hash table buckets and entry records) for O(1) average lookup. If you only need to iterate and you don't need fast Contains, a List<T> is cheaper. Use HashSet<T> when membership tests or deduplication are the point.

HashSet<T> lives in System.Collections.Generic, the same namespace as List<T> and Dictionary<TKey, TValue>. Under .NET 6 and later, that namespace is part of the implicit global usings for new console projects, so you usually don't need to write using System.Collections.Generic; yourself. The code samples in this lesson include the directive explicitly so the snippets work even outside a default template.

Constructing a HashSet

There are four constructors you'll reach for most of the time. The default constructor creates an empty set with the standard equality rules for T. Passing an integer pre-sizes the hash table so it doesn't have to grow as you add. Passing an IEqualityComparer<T> swaps in custom equality (covered later in this lesson). Passing any IEnumerable<T> builds the set from an existing sequence, deduplicating as it goes.

The fromList set started from four strings but contains three, because the second "SKU-1" was a duplicate the constructor dropped. Count always returns the number of unique values currently stored, not the number of values that were ever offered.

The capacity constructor is a hint, not a hard limit. The set can grow beyond it. Pre-sizing matters when you know roughly how many items you're going to add, because growing a hash table requires allocating a larger bucket array and re-hashing every existing entry into it. Doing that once during construction is cheaper than doing it a few times as the set fills up.

Cost: Each resize re-hashes every entry. If you're about to add 100,000 items and you don't pre-size, the table will resize several times as it grows. Passing capacity: 100000 to the constructor avoids that work.

There's also an IEnumerable<T> plus comparer overload, which is the workhorse when you want to deduplicate a sequence under custom equality. Building a case-insensitive set from a list of emails is a common shape:

Four input strings collapsed to two unique customers because the comparer treats "Ada@Shop.com" and "ada@shop.com" as the same value. The set keeps the first form it saw for each equivalence class, which is why the output shows Ada@Shop.com rather than ada@shop.com. The custom comparer changes the rules for the whole life of the set: every subsequent Add, Contains, and Remove uses it too.

Collection expressions (C# 12) also work, though there's a subtlety. The target type matters because the compiler picks the constructor based on the variable's declared type:

The collection expression ["new", "sale", "limited", "new"] produces a HashSet<string> because the declared type on the left says so. The duplicate "new" is dropped at construction time. If you'd written var tags = ["new", "sale", ...], the compiler wouldn't know which collection type you meant and would refuse to infer one.

Add Returns a Bool

Add is the standard way to put a value into the set. What surprises people is the return value: it's a bool. The set tells you whether the value was actually inserted. true means "this was new and I added it." false means "I already had this, nothing changed."

The third call returned false because "SKU-101" was already in the set. The set didn't throw, it didn't replace anything, it just shrugged and reported back. The Count stays at 2.

That return value is more useful than it looks. It lets you write deduplication logic without an explicit Contains check, which is both shorter and faster (one hash lookup instead of two):

The if (!seenOrderIds.Add(orderId)) line does both the check and the insertion in one shot. If the order is new, Add returns true, the negation is false, and processing continues. If the order is a repeat, Add returns false, the negation is true, and the early return fires. This is the canonical "process each thing once" pattern in C#, and it's a strict improvement over the two-step if (!set.Contains(x)) { set.Add(x); ... }.

Cost: A combined Add+Contains is one hash lookup, not two. Use Add's return value when you'd otherwise check before inserting. The runtime is the same, the code is shorter, and there's no window between the check and the insert.

The set's equality rules decide what counts as "already there." For value types and strings, the default rules are usually what you want. For your own classes, the default is reference equality, which means two different objects with the same content count as distinct. The "Custom Equality" section later in this lesson shows how to fix that.

Contains, Remove, Count, Clear

These four members cover the rest of the basic set API. Contains answers a yes/no membership question. Remove deletes one specific value and returns whether it was there. Count reports the current size. Clear empties the set.

Contains returns true only for values that are present according to the set's equality rules. Remove returns true if it actually removed something and false if the value wasn't there to begin with. Count is a property, not a method, and it's already maintained as items go in and come out, so reading it is O(1). Clear empties the table but keeps the internal bucket array, so the set is ready to accept new items without re-allocating right away.

The collection initializer in new HashSet<string> { "Coffee Maker", "Headphones", "Notebook" } is shorthand for an empty set plus three Add calls. It uses the same deduplication rules as everything else, so writing { "x", "x" } in an initializer ends up with one "x" in the set, not two.

Cost: Contains is O(1) on average. The worst case is O(n) when every value hashes to the same bucket, but that almost never happens with standard equality on strings, integers, or value types. If you write a custom GetHashCode that returns a constant, you'll see the worst case quickly.

Removing while iterating throws. HashSet<T>, like the other generic collections, doesn't allow mutation during a foreach. If you need to remove a batch of items, either collect them first and remove afterward, or use RemoveWhere which is built for that case:

RemoveWhere walks the set once, applies the predicate to each item, and removes the matches. It returns the number of items removed. This is the way to do bulk deletion. Writing a foreach that calls Remove inside the loop body fails at runtime with InvalidOperationException saying the collection was modified.

The substring sku["SKU-".Length..] is a range expression: it takes everything from index 4 to the end of the string. Combined with int.Parse, it pulls the numeric part out of "SKU-101" so the predicate can check whether it's even. Ranges came in with C# 8 and they're handy here because the prefix length is fixed.

Iteration Order is Unspecified

A foreach over a HashSet<T> visits every element exactly once, but the order it visits them in is an implementation detail. It is not insertion order. It is not sorted. It can change between .NET versions, and it can change between two sets that received the same inserts in a different sequence.

Possible output:

This particular run happens to print insertion order on .NET 8 because the hash table buckets fall out that way for these specific strings. That's a coincidence, not a guarantee. Inserting the same four products on a different machine, or on a different .NET version, or after a few additions and removals that perturb the bucket layout, can produce a different sequence. Production code must never depend on it.

If you need a specific order, use a different collection or sort the set's contents at the point you need them:

OrderBy returns an IEnumerable<string> sorted in ascending order, so the foreach walks the sorted projection rather than the underlying set. The set itself is unchanged. If you find yourself doing this on every read, SortedSet<T> is probably the right tool, because it keeps the elements ordered all the time at the cost of O(log n) operations instead of O(1).

Cost: OrderBy on a HashSet<T> allocates a temporary sorted projection on each call. If you sort the same set repeatedly, cache the result, or switch to a collection that keeps order.

The same rule applies to ToList(), ToArray(), and any other materialization. They preserve whatever order the source enumerator happened to produce, which is the bucket-walk order for a HashSet<T>. The result is deterministic for that particular set instance at that particular moment, but it isn't a meaningful order to rely on.

Set Algebra: Mutating Methods

The point of a set type isn't just to deduplicate. It's to do set algebra. HashSet<T> has four methods for the standard binary set operations, and they all mutate the set in place.

Method	Equivalent	What it does
`UnionWith(other)`	A ∪ B	Adds every element of `other` not already in `this`.
`IntersectWith(other)`	A ∩ B	Keeps only elements also in `other`.
`ExceptWith(other)`	A − B	Removes every element of `other` from `this`.
`SymmetricExceptWith(other)`	A ▵ B	Keeps elements in exactly one of the two, not both.

Each takes an IEnumerable<T>, so the argument doesn't have to be another HashSet<T>. Passing a list, array, or any sequence works fine.

Output (order within each line may vary):

Notice the pattern: each operation starts by copying cartA into a fresh set, then mutates the copy. That's because UnionWith and friends modify the receiver. If you ran cartA.UnionWith(cartB) directly, the original cartA would gain SKU-104 and SKU-105, which is rarely what you want when you're computing a derived view. Copy first, mutate second.

Here's the same relationship drawn as a Venn diagram, with the four results highlighted by which region each one selects:

The diagram emphasizes a useful frame: each set operation is just a choice of which Venn regions to keep. Union keeps all three. Intersect keeps the center. Except keeps the left lobe. Symmetric except keeps both lobes but drops the center. Memorizing the picture is faster than memorizing four method names, and the picture survives even when you switch languages.

In a real e-commerce flow, these operations come up constantly. "Which SKUs are in both the customer's wishlist and the current sale?" is wishlist.IntersectWith(saleSkus). "What does my cart look like if I merge in the recommended items?" is cart.UnionWith(recommended). "Which items in the cart are out of stock right now?" is cart.IntersectWith(outOfStockSkus). Once you see set algebra as the toolbox, you stop reaching for nested foreach loops for these questions.

Output (order may vary):

Two items from the wishlist intersect with the current sale. The wishlist has four items and the sale has four, and the answer is the overlap. The implementation is two lines (copy + intersect), and the cost is O(n) in the smaller set: for each item in the smaller set, the runtime does one O(1) lookup in the larger set.

Cost: All four mutating set operations take an IEnumerable<T>. When the argument is a HashSet<T> with the same comparer, the runtime optimizes the operation to walk the smaller side and do O(1) lookups in the larger. When the argument is an arbitrary sequence, it falls back to iterating the sequence and doing the lookups on each item. Both are O(n) in total, but the path matters when the sequence is something expensive to iterate (a database query, a streamed file).

The mutating methods can also be used to filter incrementally. If you've got a running set of "products the customer might want" and a new constraint comes in, calling IntersectWith with the constraint set narrows the working set in place without any intermediate allocations.

Set Algebra: Read-Only Methods

The other six set methods don't mutate anything. They answer yes/no questions about how two sets relate.

Method	True when
`IsSubsetOf(other)`	Every element of `this` is in `other`.
`IsSupersetOf(other)`	Every element of `other` is in `this`.
`IsProperSubsetOf(other)`	`this` is a subset of `other` AND they're not equal.
`IsProperSupersetOf(other)`	`this` is a superset of `other` AND they're not equal.
`Overlaps(other)`	At least one element is in both.
`SetEquals(other)`	Both sets contain exactly the same elements.

The "proper" variants add the strict requirement that the two sides aren't equal. A set is a subset of itself, but it isn't a proper subset of itself. This distinction matches the standard mathematical one and shows up in tests like "does the customer have strictly more items in their cart than the discount requires?"

The cart contains everything the discount requires, plus an extra item. IsSupersetOf is true because every element of the smaller set is in the cart. IsProperSupersetOf is also true because the cart isn't equal to the discount set (it has the extra cookie). SetEquals is false because the two sets aren't identical, and Overlaps is true because at least one element appears in both.

SetEquals deserves a closer look. It compares contents, not references. Two sets with the same elements in any internal order are equal. Two sets that look identical in their constructor calls are still distinct objects, but SetEquals will report them equal.

ReferenceEquals is false because cartA and cartB are two different objects on the heap. SetEquals is true because both objects hold the same three string values. The order they were declared in doesn't matter, and the underlying bucket order doesn't matter either. SetEquals is the right tool when "do these two sets describe the same group?" is the question.

Cost: All six read-only operations are at worst O(n + m) where n and m are the sizes of the two sets. Most of them short-circuit. Overlaps can return as soon as it finds one common element, and IsSubsetOf can return false as soon as it finds an element of this that's not in other. Passing the smaller set as this tends to be faster.

These methods are also useful as guards. Before applying a "buy three of category X, get one free" discount, you want to know whether the cart actually contains three items from category X. That's a superset test, not a count loop. Before merging two recommendation feeds, you want to know whether they have any overlap at all. That's Overlaps. The methods read like the question you're asking, which is half the reason set types feel pleasant to use once you internalize them.

Custom Equality with IEqualityComparer<T>

The set's default equality rules come from T's Equals and GetHashCode methods. For strings, that's case-sensitive ordinal equality. For value types and records, that's structural equality on all the members. For your own classes, that's reference equality by default, which means two different Customer objects with the same email count as distinct unless the class overrides equality.

When the default isn't what you want, pass an IEqualityComparer<T> to the constructor. The set will use it for every operation: Add, Contains, Remove, all the set algebra methods, everything. The contract for IEqualityComparer<T> itself is covered in detail in the _IComparer & IEqualityComparer_ lesson. The short version: it's an interface with two methods, Equals(x, y) for comparison and GetHashCode(x) for bucket placement, and both have to agree (if Equals says two values are equal, GetHashCode must produce the same hash for them).

Case-insensitive string comparison is the most common case, and the BCL ships a ready-made comparer:

The set was constructed with StringComparer.OrdinalIgnoreCase, so all three casings of Ada's email map to the same hash and compare equal. The second Add returns false because the comparer treats "ADA@shop.com" as already-present, and Contains returns true for any casing of the same address. The set keeps the first form it saw.

StringComparer has several built-in members worth knowing about:

Comparer	Behavior
`StringComparer.Ordinal`	Case-sensitive, byte-by-byte. The default.
`StringComparer.OrdinalIgnoreCase`	Case-insensitive, byte-by-byte. Fast.
`StringComparer.CurrentCulture`	Uses the current locale's collation rules.
`StringComparer.CurrentCultureIgnoreCase`	Locale-aware and case-insensitive.
`StringComparer.InvariantCulture`	Locale-independent culture-aware comparison.
`StringComparer.InvariantCultureIgnoreCase`	Same but case-insensitive.

For SKUs, email addresses, and any other "user-supplied identifier" string, OrdinalIgnoreCase is almost always the right call. The culture-aware comparers are for human-language text where things like Turkish dotted-vs-dotless I matter.

For your own classes, you write the comparer. Here's a comparer that treats two Customer objects as equal when their emails match case-insensitively:

The two Customer instances are different objects with different Name values, but the comparer only looks at Email. The second Add returns false because, according to the comparer, the customer is already in the set. The set keeps the first instance it saw, including the original name.

The two methods on the comparer have to agree. If Equals(a, b) is true, then GetHashCode(a) must equal GetHashCode(b). The contract is what lets the hash table find a candidate in O(1): the runtime first jumps to the bucket determined by GetHashCode, then walks that bucket calling Equals to find an actual match. If two equal objects produce different hashes, the runtime will look in the wrong bucket and report the value as absent. The _IComparer & IEqualityComparer_ lesson walks through this contract and the common ways to get it wrong.

Cost: GetHashCode runs on every Add, Contains, and Remove. If your comparer's GetHashCode is expensive (allocates, calls into a database, walks a deep object graph), the whole set gets slow. Keep it cheap. For strings, the built-in StringComparer types use highly tuned implementations.

For records, you usually don't need a comparer at all. C# generates structural Equals and GetHashCode based on the record's positional parameters, and the default rules already treat two records with the same field values as equal:

The record Sku(string Code) gives Sku value-like equality. Two Sku instances with the same Code are considered equal, and GetHashCode is implemented in lockstep. The set treats them as the same value without any explicit comparer. This is one of the reasons records are a popular choice for identifier-like types in modern C#.

Inside the Hash Table

The set's performance comes from its internal layout. Knowing roughly how it works helps with two things: predicting cost when you write something unusual, and understanding why operations occasionally slow down (resize) or behave oddly (custom comparers).

A HashSet<T> stores its entries in an array of "buckets," each of which is the head of a short chain of entries. When you call Add(value), the runtime does roughly this:

Compute hash = comparer.GetHashCode(value).
Find a bucket index by taking hash % bucketCount.
Walk the chain at that bucket. For each existing entry, call comparer.Equals(entry, value). If any returns true, the value is already present, return false.
If no match was found, append a new entry to the chain and return true.

Contains and Remove follow the same first three steps and stop when they find a match. Because well-distributed hashes spread entries evenly across buckets, the average chain length stays close to one, and each operation costs O(1).

The flow makes it clear where each part of the comparer contract earns its keep. GetHashCode decides which bucket to look in, so a bad hash distribution sends too many entries to the same bucket and slows everything down. Equals decides whether a chain entry is actually the value you're after, so a wrong Equals will either fail to find existing entries (false negatives, leading to duplicates) or wrongly identify unrelated entries as matches (false positives, leading to lost data).

When the set fills up, the bucket array resizes. The runtime allocates a larger array (typically the next prime above twice the current size), recomputes the bucket index for every existing entry, and copies them into the new array. This is the expensive step that capacity pre-sizing avoids. After the resize, operations resume at O(1) average.

The takeaway is that HashSet<T> is a hash table, with all the strengths and trade-offs that implies: blazing fast for the common case, dependent on a reasonable hash distribution, and order-free.

Common Patterns

A few patterns come up often enough to call out by name.

Deduplicating a sequence. Pass an IEnumerable<T> to the constructor and take the count, or iterate the resulting set. This is the cleanest one-liner for "how many unique values are in this list?"

The constructor walks the array once, adds each value to the hash table, and silently drops duplicates. Count reports the unique total. For a long log, this is dramatically cheaper than a nested loop or a sort-then-scan, and the case-insensitive comparer handles the typical "Ada@Shop.com vs ada@shop.com" problem at the same time.

Fast membership testing in a hot path. Build the set once, query it many times. The classic example is "is this SKU on the discount list?" running inside a checkout loop.

The membership test inside the loop is constant time. If discountedSkus were a List<string>, the same check would scan from the beginning every iteration, turning the whole loop from O(n) into O(n * m) where m is the size of the discount list. With small numbers nobody notices. With a thousand cart items and ten thousand discount SKUs, the difference is roughly 10,000,000 comparisons versus 1,000.

Finding common elements without nested loops. When the question is "what's in both?", the answer is IntersectWith, not a doubly-nested foreach.

Output (order may vary):

The intersection takes O(n) in the smaller set, with each lookup in the larger set being O(1). The nested-loop alternative is O(n * m). This is the single best argument for keeping membership-tested collections as sets in the first place.

Restricting a working set incrementally. When filters pile up ("must be in stock," "must be in the customer's preferred category," "must be on sale"), apply each as an IntersectWith against the running set. The mutations stack cleanly and there's no allocation overhead beyond the initial copy.

Three filters reduced six SKUs to one. The code reads in the same order the constraints would be described in plain language. Each step mutates working in place, so the only allocation is the initial copy. No temporary lists, no LINQ chains, no per-iteration garbage.

Performance Notes

HashSet<T> performance lives or dies by the quality of GetHashCode. A well-distributed hash spreads entries across buckets so each chain stays short and lookups stay O(1). A bad hash (everything returning the same value, or values that collide for predictable inputs) collapses every operation toward O(n).

For built-in types the runtime handles this well. int.GetHashCode is the integer itself. string.GetHashCode is randomized per process to defend against hash-flooding attacks (which means the same string can hash differently between two runs of the same program, which is a subtlety to remember if you're tempted to persist hash codes). Records autogenerate a hash based on their fields. Custom classes with a manual Equals override need a matching GetHashCode, which the compiler will warn about if you only do one.

Operation	Average	Worst case
`Add(item)`	O(1)	O(n)
`Contains(item)`	O(1)	O(n)
`Remove(item)`	O(1)	O(n)
`Count` (property)	O(1)	O(1)
`Clear()`	O(n)	O(n)
`foreach`	O(n)	O(n)
`UnionWith(other)`	O(m)	O(n + m)
`IntersectWith(other)`	O(min)	O(n + m)
`ExceptWith(other)`	O(m)	O(n + m)
`SetEquals(other)`	O(n)	O(n + m)

The worst case is what happens when every hash collides. In real code with the standard equality rules, you don't hit it. You can engineer it by writing a deliberately bad GetHashCode, which is a useful exercise for understanding the cost model. For correctness, only Equals matters. For speed, GetHashCode matters at least as much.

Cost: Re-hashing during resize is the one cost that surprises people. If you build a set of a million items without pre-sizing, the bucket array will resize many times along the way, and each resize is O(current size). Pre-size with the constructor's capacity argument when you have a rough idea of the final count.

Memory usage is also worth a brief mention. Each entry in a HashSet<T> carries a small fixed overhead (the hash code, the next-pointer for chaining, and the value itself), so a set is somewhat heavier per element than a packed array. For sets of value types this overhead is modest; for sets of reference types it's negligible compared to the heap objects the references point to.

For very small sets (a handful of items), a List<T> with a linear Contains can be faster in practice, because the constant factors on the hash table are higher than a tight loop over four entries. The crossover point is small, usually under a dozen elements, but it exists. If you're building a hot path with a known-small set, benchmark before assuming HashSet<T> is automatically the right answer.

Summary

HashSet<T> stores unique values in a hash table. Add, Contains, and Remove are O(1) on average.
Add returns true when it inserted a new value and false when the value was already present. Use the return value to combine "check and insert" into a single hash lookup.
Iteration order is unspecified. Don't rely on insertion order, sorted order, or any other order. Use SortedSet<T> or sort at the point of use when order matters.
The four mutating set operations (UnionWith, IntersectWith, ExceptWith, SymmetricExceptWith) modify the receiver in place. Copy the source first when you want a derived view without losing the original.
The six read-only set operations (IsSubsetOf, IsSupersetOf, IsProperSubsetOf, IsProperSupersetOf, Overlaps, SetEquals) answer relationship questions and most of them short-circuit.
Custom equality goes in via an IEqualityComparer<T> on the constructor. The comparer's Equals and GetHashCode must agree, or the hash table's invariants break. The full contract is the subject of the _IComparer & IEqualityComparer_ lesson.
Pre-size the set with the constructor's capacity argument when you know roughly how many items you're going to add. It avoids the cost of repeated resizes as the table fills.
RemoveWhere is the way to delete a batch of items. Mutating the set inside a foreach throws InvalidOperationException.

HashSet&lt;T&gt;