Last Updated: May 22, 2026
The generic collections we've seen so far (List<T>, Dictionary<TKey, TValue>, Queue<T>, Stack<T>) are designed for single-threaded code. Touching them from more than one thread at a time can corrupt their internal state or throw InvalidOperationException partway through an operation. The System.Collections.Concurrent namespace provides drop-in replacements that are safe to read and write from multiple threads, with built-in atomic operations for the read-modify-write patterns that locks would otherwise be required for. This lesson covers the four core concurrent types, the blocking wrapper, the performance trade-offs, and the common pitfalls.
The generic collections in System.Collections.Generic (List<T>, Dictionary<TKey, TValue>, Queue<T>, Stack<T>) are not thread-safe. The internal data structures (arrays, hash buckets, linked nodes) assume that only one thread is mutating them at a time. If two threads add to a List<T> at the same instant, the writes can interleave in ways that produce a corrupt array, a lost element, or an index out of range.
The following program demonstrates the problem with a Dictionary<TKey, TValue> tracking stock counts across an e-commerce warehouse:
The exact wording can vary across runs. Sometimes you get the message above. Sometimes the loop appears to finish but the final count is wrong. Sometimes the dictionary's internal state ends up corrupt enough that a later read throws. The behavior is undefined, which is what "not thread-safe" actually means: the runtime makes no promises about what happens.
The fix that pre-dates the concurrent collections is to wrap every access in a lock:
This works, but every thread that wants to touch the dictionary has to wait for whichever thread currently holds the lock. With 1000 small operations and ten cores fighting for the same gate, you've serialized a workload that the parallel loop was supposed to speed up.
The concurrent collections solve this differently. Instead of a single lock around the whole structure, they use finer-grained locking (or lock-free atomic operations) so that independent operations on different parts of the collection can run in parallel. The same program with ConcurrentDictionary<TKey, TValue> looks like this:
No external lock, no corruption, no lost updates. The dictionary handles the synchronization internally, and operations on different keys can proceed in parallel.
ConcurrentDictionary is faster than lock + Dictionary under contention because it shards its internal state across many small locks (one per partition). It is slower than a plain Dictionary when only one thread is touching the data, because every operation still has to do the atomic-update bookkeeping. Don't use it in single-threaded code.
ConcurrentDictionary<TKey, TValue> is the most-used type in this namespace, and it's the one that earns the most space in a lesson because its API differs the most from its single-threaded counterpart. The basic shape is the same: keys, values, an indexer, a Count, and so on. The difference is that every method that mutates state has been designed so that "read the current value, decide what to do, write a new value" is atomic, even when several threads run the same code at the same time.
The four core methods are TryAdd, TryUpdate, TryRemove, and the two combining methods AddOrUpdate and GetOrAdd. Each one packages a multi-step operation into a single call the runtime guarantees is atomic with respect to other concurrent operations on the same key.
TryAdd(key, value) inserts the value only if the key doesn't already exist. It returns true on success and false if the key was already present. Either way, no exception is thrown for a duplicate.
The second call sees the key already exists, returns false, and leaves the stored value untouched. Compare this with Add on a plain Dictionary<TKey, TValue>, which would throw ArgumentException on the duplicate. TryAdd makes "already there" a result, not a failure mode, which is what you want when several threads might try to register the same item.
TryUpdate(key, newValue, comparisonValue) updates a key's value only if the current value matches the comparisonValue you pass in. This is a compare-and-swap operation: the update happens only when the dictionary's stored value is exactly what you expected it to be when you decided to change it. If another thread has changed the value since you last read it, the update is rejected and you can retry with a fresh read.
The first call sees that the stored value is 10 (which matches comparisonValue) and writes 8. The second call expects 10 but finds 8, so it refuses to overwrite, leaving the value at 8. In a multi-threaded loop, this gives you a building block for "update if it hasn't changed since I last read it" without holding a lock.
TryRemove(key, out value) removes the key if it exists and outputs the removed value. It returns false (with value set to the default) when the key isn't there. Calling Remove on a plain Dictionary<TKey, TValue> returns a boolean too, but TryRemove is safe to call concurrently with other operations on the same dictionary.
The out parameter is useful when you want to do something with the removed value (log it, return it to the caller, recycle it). The discard pattern (out _) skips the value when you only care whether the key was present.
AddOrUpdate is where things get more interesting. It combines insert and update into a single atomic operation. You give it the key, a value (or value factory) to use if the key is absent, and an update function to apply if the key is present. It returns the value the dictionary ends up holding.
The first call sees no key and stores 1. The next two find the key and run the update function. Run this from a hundred parallel threads and the final count is exactly one hundred, with no lost increments. The dictionary serializes updates to a single key internally, so two threads can't both read 5, both add 1, and both write 6 (the classic lost-update bug).
This is the recommended way to maintain per-key counters under contention. A real e-commerce example: counting how many times each product gets added to a cart across all sessions.
Every one of the 10,000 increments lands somewhere, and the per-product totals add up to the original count.
GetOrAdd(key, factory) returns the current value for a key, or runs the factory to produce a new one and stores it if the key isn't there yet. It's the canonical "build this once, cache it, return the cached copy thereafter" pattern.
The factory runs once. The second call finds the key already in the dictionary and returns the cached value without running its factory at all. That's the happy single-threaded case. Under concurrent access, the story is more subtle, which the next section covers.
The factory passed to GetOrAdd and AddOrUpdate is not protected by an internal lock. If two threads call GetOrAdd with the same key at the same time and neither finds the key, both threads run their factory. The dictionary then uses an atomic compare-and-swap to install exactly one of the produced values, and the other is discarded. The factory's side effects are not undone.
Output (typical):
The exact count varies per run depending on thread scheduling, but it's almost never 1. Multiple threads race past the "is the key here yet?" check, each runs the factory, and only the first to finish actually populates the dictionary. The rest of the produced values are thrown away.
This is fine if the factory is cheap and side-effect-free (computing a string, parsing a small object). It is a real problem if the factory opens a database connection, loads a large file, or charges a payment processor. The work might be done several times even though only one result wins.
The standard fix is to store a Lazy<T> in the dictionary instead of T directly. Lazy<T> is thread-safe by default (or you can configure its threading mode), so even if several threads insert different Lazy<T> wrappers, only the wrapper that wins the race ever has its factory invoked.
Now the factory runs exactly once across all threads, because the Lazy<string> instance that "won" the dictionary insertion is the only one whose .Value ever runs the real factory. The other Lazy<string> instances that lost the race are garbage-collected with their factories never invoked.
GetOrAdd with a cheap factory is fine without Lazy<T>. Wrap with Lazy<T> only when the factory is expensive enough that running it 5-50 times instead of once would matter, or when the factory has side effects you don't want repeated.
ConcurrentQueue<T> is a thread-safe first-in, first-out queue. It's the concurrent counterpart of Queue<T>. The API is a little different because operations that "look at the front" have to deal with the possibility that another thread emptied the queue between your check and your read.
The three core methods are Enqueue, TryDequeue, and TryPeek.
Enqueue adds to the back. TryDequeue removes from the front and returns it through an out parameter, returning true on success and false (with the default value) when the queue is empty. TryPeek is the same shape but doesn't remove the element. Both Try* methods exist because in concurrent code "is there anything in the queue?" and "give me that thing" cannot be two separate operations: another thread might drain it in between. The single-call atomic version is what makes them safe.
The implementation is lock-free. Internally it uses linked segments and atomic compare-and-swap operations on the head and tail pointers, so threads enqueueing and dequeueing don't block each other in the way they would with an external lock. Use this type when you want a producer-consumer queue and the consumers don't need to wait for items (they can busy-poll or use a different signaling mechanism).
The following e-commerce order pipeline has three producer threads enqueueing orders and two consumer threads processing them:
Each producer enqueues ten orders. Each consumer pulls orders off the front and counts what it processed. The total comes out right because every enqueued item is dequeued exactly once. The done flag plus the IsEmpty check is how the consumers know they can stop: they continue draining the queue even after producers signal completion, then exit when there's nothing left.
This polling pattern works, but the consumers spin on the queue when it's empty. A cleaner version uses BlockingCollection<T>.
The diagram shows the producer-consumer shape: three producers push orders onto the back of the shared queue, two consumers pull them off the front. The queue itself handles all the thread-safety, so no producer or consumer needs an external lock.
ConcurrentQueue<T>.Count is O(n) when the queue is being modified by other threads, because the implementation has to walk segments to get an accurate count. Don't call Count in a tight loop. Use IsEmpty (which is O(1)) when you only need to know whether there's anything to process.
ConcurrentStack<T> is the thread-safe last-in, first-out counterpart of Stack<T>. The shape mirrors ConcurrentQueue<T>: Push, TryPop, and TryPeek, plus bulk versions PushRange and TryPopRange.
Push adds to the top. TryPop removes the top and returns it. TryPeek looks at the top without removing it. As with the concurrent queue, all three are safe to call from many threads at once. The implementation uses a single linked list with atomic compare-and-swap on the head pointer.
The bulk versions are useful when you have a batch of items to push or pop together. PushRange is atomic across the whole batch: the items appear on the stack as a contiguous block, not interleaved with pushes from other threads. TryPopRange removes up to N items in one call.
PushRange pushes 1, 2, 3, 4, 5 so that 5 ends up on top. TryPopRange fills the buffer with the top three items, in pop order. The remaining items (2 and 1) stay on the stack.
ConcurrentStack is less common than the queue. Most pipeline workflows are FIFO (process the oldest order first, ship the oldest review first), and FIFO is what the queue gives you. Use the stack when you specifically want LIFO behavior: an undo history shared across threads, a work-stealing scheduler where a thread prefers its most recent task, or a recently-viewed list where the latest visit shows on top.
ConcurrentStack<T> allocates a new node per Push, the same way LinkedList<T> does. If you're pushing millions of items in a hot loop, the allocation can dominate. ConcurrentQueue<T> uses segmented arrays internally and allocates less per element.
ConcurrentBag<T> is an unordered, thread-safe collection that allows duplicates. The API is small: Add to put something in, TryTake to pull something out, TryPeek to look at one without removing.
Output (one possible order):
There's no order guarantee. The bag can return items in insertion order, reverse order, or anything in between, and the order can change between runs.
What makes ConcurrentBag<T> special is its internal structure. Each thread that touches the bag gets its own local list. When a thread calls Add, the item goes into that thread's local list. When the same thread calls TryTake, it pulls from its own local list first, only "stealing" from other threads' lists when its own is empty.
This design pays off when each producer also tends to be the consumer of its own items. Examples: an object pool where threads check out objects and return them, or a parallel algorithm where each worker accumulates partial results and consumes them locally. In those cases, most operations stay on the thread's own list, so they're essentially uncontended.
The flip side is that ConcurrentBag<T> is poor for "many producers, separate many consumers" workloads. The consumers have to walk other threads' lists to find items, and that's more expensive than a ConcurrentQueue<T> would be.
The Parallel.For overload here keeps a per-thread local accumulator (the 0L initializer), updates it for each iteration the thread handles, and finally Adds the local total into the bag once per thread when that thread finishes. Each Add is into the same thread that produced the value, which is exactly the case ConcurrentBag is optimized for.
The sum 0 + 1 + 2 + ... + 999 is 499500, and the bag plus the final aggregation produces that result regardless of how the work was split across threads.
ConcurrentBag<T>.Count walks every thread-local list to compute the total. It's not free. If you only need "is it empty?" use IsEmpty.
The four types overlap enough that it helps to put them side by side. Pick based on whether you need ordering, whether the "consumer is usually the producer", and how often you'll be reading by key.
| Type | Ordering | Lookup by key? | Best when |
|---|---|---|---|
ConcurrentDictionary<TKey, TValue> | None | Yes, O(1) average | Per-key state, counters, caches with GetOrAdd |
ConcurrentQueue<T> | FIFO | No | Producer-consumer pipelines, ordered work to process |
ConcurrentStack<T> | LIFO | No | Undo histories, "newest first" consumers, work-stealing |
ConcurrentBag<T> | None | No | Same thread both adds and takes, parallel reductions |
A few rules of thumb fall out of the table. If you need to look something up by key, ConcurrentDictionary is the only one of the four that supports it. If order matters and you need FIFO, that's the queue. If you specifically want LIFO, that's the stack. The bag is the odd one out: pick it when same-thread add-take is your dominant pattern, otherwise use the queue.
Another way to think about the four is by what they replace. ConcurrentDictionary replaces lock + Dictionary. ConcurrentQueue replaces lock + Queue. ConcurrentStack replaces lock + Stack. ConcurrentBag is the one without a single-threaded counterpart, because the per-thread-list optimization only makes sense when there are multiple threads to optimize for.
All four concurrent types implement IEnumerable<T>. Enumerating them takes a snapshot at the moment iteration starts, so the foreach is safe even if other threads keep mutating the collection. The snapshot is consistent but not synchronized: an item added during iteration might or might not appear in the foreach. The cost is a bit of extra memory and CPU compared to enumerating a plain List<T>.
BlockingCollection<T> is not a separate concurrent collection. It's a wrapper that takes any IProducerConsumerCollection<T> (the queue, the stack, the bag, or a custom type) and adds two things: a bounded capacity, and blocking behavior on add and take.
The earlier producer-consumer example used a polling loop with TryDequeue and a manual "done" flag. BlockingCollection<T> replaces both pieces with built-in methods.
The producer calls Add, which blocks when the wrapped collection has hit boundedCapacity. With a capacity of 5, the producer can never run more than five items ahead of the consumers, which keeps memory usage bounded and provides natural back-pressure when consumers are slow. The consumers iterate GetConsumingEnumerable(), which is an IEnumerable<T> that pulls items as they arrive and blocks when the collection is empty but not yet completed. The producer signals "no more items" with CompleteAdding(). After that call, once the queue drains, GetConsumingEnumerable finishes its loop and the consumer tasks complete naturally.
The default backing collection is ConcurrentQueue<T>, giving FIFO ordering. You can pick a different backing structure by passing one to the constructor:
Use the bounded-capacity version for producer-consumer code. It provides back-pressure: if consumers fall behind, producers slow down naturally instead of letting the queue grow unboundedly. The same setup with a raw ConcurrentQueue would either need a separate semaphore or risk running out of memory.
Blocking is implemented with a SemaphoreSlim internally, so threads that hit the capacity limit park rather than spin. The wake-up after a take is fast (microseconds) but is more expensive than Try* calls on the underlying concurrent collection. For ultra-low-latency code where producers and consumers run on dedicated cores, polling with TryAdd/TryTake and a manual signal can be faster. For most application code the blocking version is the conventional default.
The concurrent collections aren't free. Their atomic operations involve memory barriers, retry loops, or partitioned locks, all of which are more expensive than the plain reads and writes a single-threaded collection performs. The reason to use them is that they're cheaper than the alternative (an external lock) when several threads contend for the same data.
Three rough rules:
Dictionary<TKey, TValue> are 2-5x faster than the same operations on ConcurrentDictionary<TKey, TValue> when there's no contention.ConcurrentDictionary partitions its internal state so that operations on different keys can proceed independently. A single lock on a Dictionary serializes everything, even unrelated keys.A trivial benchmark sketches the difference for incremental counters:
Output (varies by machine):
The exact numbers depend on the hardware and the .NET version, but the shape is consistent: under enough contention, the concurrent type wins. The single lock forces all 1,000,000 operations to take turns, while ConcurrentDictionary lets the operations partition across its internal buckets.
That said, "use the concurrent collection because it's faster" is wrong in the single-threaded case, and it's not always right under low contention either. Match the tool to the access pattern. The access pattern that makes concurrent collections worth it is "many threads, hot data, key-level independence".
ConcurrentDictionary<TKey, TValue> has a configurable concurrencyLevel constructor parameter that controls how finely the internal state is partitioned. The default is good for general use. Increase it only if you have a profile showing lock contention inside the dictionary itself, which is rare.
There are cases where concurrent collections are the wrong choice. Three big ones:
Single-threaded code. If only one thread ever touches the data, the concurrent types add overhead with no benefit. The atomic operations still happen, for an audience of one. Use Dictionary, List, Queue, and the rest of the generic collections.
Read-only after construction. If you build a collection once at startup and only read from it afterwards, threads can read freely without any synchronization at all. Even better, use the immutable collections from System.Collections.Immutable. They're optimized for the "build once, read many" case and have stronger guarantees than the concurrent ones (you can't accidentally mutate them).
Multi-step transactions across multiple collections. Concurrent collections give you atomicity within a single operation on a single collection. They do not give you atomicity across two collections, or across two operations on the same collection. If your business rule is "decrement stock by 1 AND add the item to the cart, both or neither," you still need a lock (or a transaction) around the pair. The concurrent versions of each collection don't compose into a multi-collection transaction.
The single-threaded run looks fine, but two threads racing on this code can each see stock = 1, each call TryUpdate("sku-001", 0, 1), and exactly one wins the compare-and-swap. The other thread's TryUpdate returns false. So far so good. The problem is that nothing protects the gap between the dictionary update and the queue add. Another thread reading the dictionary could see stock["sku-001"] = 0 before the cart enqueue has happened, and incorrectly conclude there's no item to add to a different cart. This is a multi-collection transaction, and concurrent collections don't provide that on their own. You need a lock around the pair.
The wrong type for the wrong access pattern. Picking ConcurrentBag<T> when you need FIFO order, or ConcurrentStack<T> when you need keyed lookup, is worse than using the appropriate type with a lock. The benchmarks in the previous section assume the type is suited to the workload. Bag-backed code that has to enumerate the whole bag to find a specific item performs terribly compared to a dictionary, even if both are concurrent.
ConcurrentBag<T>.TryTake is amortized cheap when each thread takes from its own list, but if a thread tries to take and its own list is empty, the bag scans other threads' lists. With many threads and uneven add/take patterns, that scan can dominate. If your access pattern looks like "many threads add, one thread drains", use ConcurrentQueue<T> instead.
Enumerating a concurrent collection takes a snapshot of its state at the moment iteration starts. Mutations that happen during the iteration may or may not appear in the foreach loop, depending on the type. The foreach won't throw, but you can't assume you're seeing a consistent point-in-time view across the whole loop.
Output (typical, varies):
The count seen during iteration is somewhere between the starting size (1000) and the ending size (500), depending on how the timing worked out. The foreach didn't throw, which is the key safety guarantee. The exact number it visited is not guaranteed.
This is usually fine for monitoring, logging, or computing approximate statistics. It's not fine if you need a transactional snapshot. For that, you'd need an external lock or one of the immutable collections.
The same caveat applies to Count. On ConcurrentDictionary, Count is O(n) and requires taking the internal locks momentarily to compute an accurate value. On ConcurrentQueue, Count is more expensive than IsEmpty because it has to walk segments. Use IsEmpty when "anything at all?" is the question, and reserve Count for cases where you need the number.