Last Updated: February 1, 2026
Think of java.util.concurrent (j.u.c) as a professional toolkit for concurrent programming. Before this package existed in Java 5, developers had to build their own thread pools, concurrent data structures, and synchronization utilities.
Doug Lea and his team designed j.u.c to provide battle-tested, highly-optimized implementations of common concurrency patterns.
The j.u.c package is organized around several key concepts:
Design Philosophy:
The most important concurrent collection. It's not just a synchronized HashMap; it's a completely different design optimized for concurrent access.
Pre-Java 8: Segment-based Locking
The map was divided into segments (default 16), each with its own lock. Threads accessing different segments never contend.
Java 8+: Node-based Locking with CAS
Java 8 redesigned ConcurrentHashMap with finer-grained locking:
Key improvements:
Common Pitfall: Check-then-Act
A thread-safe List that creates a new copy of the underlying array on every mutation. Reads never block.
When to use:
When NOT to use:
A lock-free, non-blocking FIFO queue based on Michael-Scott algorithm.
The queue uses CAS for both enqueue and dequeue operations. Multiple threads can add and remove concurrently without blocking each other.
ConcurrentLinkedQueue is fast and non-blocking, but it has a limitation: poll() returns null when the queue is empty. What if you want consumers to wait for new items instead of busy-polling? This is where blocking queues come in.
Blocking queues add the ability to wait: producers block when the queue is full, consumers block when it is empty. They are the foundation of producer-consumer patterns and provide natural flow control between threads.
| Operation | Throws | Returns Special Value | Blocks | Times Out |
|---|---|---|---|---|
| Insert | add(e) | offer(e) → false | put(e) | offer(e, time, unit) |
| Remove | remove() | poll() → null | take() | poll(time, unit) |
| Examine | element() | peek() → null | N/A | N/A |
A bounded queue backed by an array. Fixed capacity at creation.
Characteristics:
new ArrayBlockingQueue<>(100, true)Optionally bounded queue backed by linked nodes.
Characteristics:
An unbounded priority queue that orders elements by natural order or Comparator.
Characteristics:
A queue with zero capacity. Each put must wait for a take, and vice versa. It's a direct handoff.
Use cases:
We have covered concurrent collections and blocking queues, but there is still a missing piece: who actually runs the code that uses these collections?
Creating and managing threads manually is tedious and error-prone. The Executors framework provides standardized thread pools that handle thread lifecycle, resource management, and task scheduling.
Think of it this way: blocking queues handle the "what" (the tasks), while executors handle the "who" (the threads that process them).
The core implementation that all standard executors use:
Task Submission Flow:
Core Parameters:
| Parameter | Purpose | Tuning Guidance |
|---|---|---|
| corePoolSize | Threads always kept alive | CPU cores for CPU-bound, more for I/O-bound |
| maximumPoolSize | Maximum threads allowed | Consider memory and context-switch overhead |
| keepAliveTime | Idle time before non-core threads die | 60 seconds is common default |
| workQueue | Buffer for pending tasks | Bounded prevents memory exhaustion |
| threadFactory | Creates new threads | Name threads for debugging |
| rejectedExecutionHandler | What to do when pool is saturated | See handlers below |
Rejection Handlers:
Why avoid unbounded queues:
ThreadPoolExecutor works great for independent tasks, but what about tasks that spawn sub-tasks?
Consider sorting a large array: you could split it in half, sort each half in parallel, then merge. Each half could split further. Standard thread pools are not optimized for this pattern because idle threads sit around while others are overloaded with sub-tasks.
Fork/Join is designed specifically for divide-and-conquer algorithms. It introduces a clever optimization called work-stealing that keeps all threads busy even when task sizes vary.
The key innovation is work stealing. Each thread has its own deque (double-ended queue). Threads steal from others when idle.
Why LIFO for own work, FIFO for stealing?
RecursiveAction for void operations:
Java 8 introduced a shared pool for all Fork/Join operations:
Now that we have covered collections, queues, executors, and Fork/Join, there is one more critical piece of the concurrency puzzle: how do you coordinate threads at specific points in their execution? This is where synchronizers come in.
Synchronizers provide higher-level coordination primitives that go beyond simple locking. Instead of manually managing wait/notify or condition variables, you get purpose-built tools for common coordination patterns. Each synchronizer solves a specific problem, and picking the right one can make your concurrent code dramatically simpler.
Why this matters: Imagine you are starting a web application that depends on three services: database connection pool, cache warm-up, and configuration loading. You cannot serve requests until all three are ready. How do you wait for all of them without busy-waiting or complex flag-checking? This is exactly what CountDownLatch solves.
A CountDownLatch is a one-shot barrier that allows one or more threads to wait until a set of operations in other threads completes. You initialize it with a count, threads call countDown() to decrement, and waiting threads unblock when the count reaches zero.
The key insight is that CountDownLatch separates "who waits" from "who signals". Any number of threads can wait on the latch, and any threads (not necessarily the same ones) can count down.
The following diagram shows how CountDownLatch coordinates multiple workers with a main thread waiting for completion.
Here is a practical example showing how to wait for multiple services to initialize before accepting requests.
Notice that countDown() is in a finally block. This is important because if a service throws an exception, you still want to decrement the count. Otherwise, waiting threads would block forever.
Starting gun pattern: You can also use CountDownLatch as a starting gun to release multiple threads simultaneously.
This pattern is useful for benchmarking or testing concurrent behavior because all threads start their work at exactly the same moment.
Timeout support: You can also wait with a timeout to avoid blocking forever if something goes wrong.
Key characteristics:
While CountDownLatch is great for one-time coordination, what if you need threads to synchronize repeatedly across multiple phases? This is where CyclicBarrier shines.
Why this matters: Consider a parallel simulation where you divide a grid among multiple threads. Each thread computes values for its portion, but before moving to the next iteration, all threads must finish the current one because neighboring cells need updated values. You need a synchronization point that threads can hit repeatedly, and that is exactly what CyclicBarrier provides.
A CyclicBarrier waits for a fixed number of parties to arrive, then releases all of them and resets automatically for the next round. Unlike CountDownLatch where any thread can count down, with CyclicBarrier the waiting threads themselves are the parties.
The following diagram shows how three threads synchronize at a barrier before proceeding to the next phase.
Here is a practical example simulating a parallel grid computation where threads must synchronize between iterations.
Notice that the barrier action (the lambda passed to the constructor) runs after all threads arrive but before any are released. This is a convenient place to perform work that must happen between phases, like swapping buffers or aggregating results.
BrokenBarrierException: What happens if one thread times out or gets interrupted while others are waiting? The barrier becomes "broken", and all waiting threads receive a BrokenBarrierException. This prevents threads from waiting forever when something goes wrong.
Manual reset: You can also reset the barrier manually, which breaks it for any currently waiting threads.
CountDownLatch vs CyclicBarrier:
| Feature | CountDownLatch | CyclicBarrier |
|---|---|---|
| Reusable | No, one-shot | Yes, automatic reset |
| Waiting threads | Any number can wait | Exactly N parties |
| Who counts | Any thread can countDown() | Only waiting threads (await) |
| Barrier action | No | Yes, runs between phases |
| Reset | Cannot reset | reset() or automatic on trip |
| Broken state | N/A | BrokenBarrierException |
| Use case | Wait for events | Coordinate peer threads |
When to use which:
Both CountDownLatch and CyclicBarrier coordinate timing, but they do not limit concurrency. What if you need to restrict how many threads can access a resource simultaneously? This brings us to Semaphore.
Why this matters: Imagine you have an API that allows at most 10 concurrent requests to a downstream service. More than that and the service becomes overloaded. Or consider a database connection pool with 20 connections. You need a way to limit concurrent access to exactly N, blocking additional threads until a slot becomes available. Semaphore is designed precisely for this.
A Semaphore maintains a set of permits. Threads call acquire() to get a permit (blocking if none available) and release() to return one. Unlike CyclicBarrier where all threads must arrive, with Semaphore threads come and go independently. The limit is on concurrent holders, not on total threads.
The following diagram illustrates how a semaphore with 3 permits controls access to a limited resource.
Here is a practical example showing how to limit concurrent access to an external API.
Acquiring multiple permits: Sometimes you need to acquire more than one permit for heavy operations.
Fair mode: By default, semaphores do not guarantee FIFO ordering. A thread that requests a permit might get one before threads that were waiting longer. If fairness matters, use the fair mode.
Fair mode has lower throughput because it requires more bookkeeping, but it prevents starvation where some threads wait indefinitely.
Binary semaphore vs mutex: A semaphore with one permit acts like a mutex, but there is a subtle difference. With a mutex, only the thread that locked it can unlock it. With a semaphore, any thread can release a permit.
This flexibility can be useful (e.g., producer releases, consumer acquires) but can also lead to bugs if you accidentally release without acquiring.
Use cases:
CyclicBarrier works well when you know exactly how many threads will participate. But what if threads need to join or leave dynamically? What if you need to track which phase you are on? What if you want to terminate after a certain phase? This is where Phaser comes in.
Why this matters: Consider a multi-stage document processing pipeline where:
CyclicBarrier cannot handle this because the party count is fixed. Phaser provides the flexibility to handle all these scenarios.
Understanding Phaser concepts:
A Phaser is like a CyclicBarrier with superpowers:
The following diagram shows the lifecycle of a Phaser with dynamic party registration.
Here is a practical example showing a multi-stage task processor where workers can join and leave dynamically.
This example demonstrates several Phaser features: dynamic registration with register(), early departure with arriveAndDeregister(), phase tracking with getPhase(), and termination control via onAdvance().
Key Phaser methods explained:
| Method | Description |
|---|---|
register() | Add a new party (thread joins) |
arrive() | Arrive at barrier but do not wait (for hand-off patterns) |
arriveAndAwaitAdvance() | Arrive and wait for others (like CyclicBarrier.await()) |
arriveAndDeregister() | Arrive, do not wait, and leave the phaser |
awaitAdvance(phase) | Wait for phase to advance (for observers) |
getPhase() | Current phase number (negative if terminated) |
isTerminated() | Check if phaser is terminated |
onAdvance(phase, parties) | Override to control termination |
arrive() vs arriveAndAwaitAdvance(): The arrive() method signals arrival but does not wait. This is useful when a thread has finished its contribution but does not need to wait for others before proceeding to other work.
Tiered phasers for scalability: When you have many parties (hundreds or thousands), a single phaser becomes a bottleneck. Phaser supports tiering where child phasers synchronize with a parent.
Phaser vs CyclicBarrier comparison:
| Feature | CyclicBarrier | Phaser |
|---|---|---|
| Party count | Fixed at creation | Dynamic (register/deregister) |
| Phase tracking | No (manual counting) | Yes (getPhase()) |
| Early termination | No built-in support | Override onAdvance() |
| Non-waiting arrival | No | Yes (arrive()) |
| Tiering | No | Yes (parent-child phasers) |
| Complexity | Simpler | More complex |
| Use case | Fixed peer threads | Dynamic workflows |
When to use Phaser:
The synchronizers we have covered so far coordinate multiple threads. But sometimes you have exactly two threads that need to swap data with each other. This is where Exchanger provides an elegant solution.
Why this matters: Consider a producer-consumer scenario with a twist: instead of a queue between them, you want them to swap buffers directly. The producer fills a buffer, the consumer processes a buffer, and then they exchange. This avoids the overhead of queue operations and provides natural flow control because both must rendezvous to swap.
An Exchanger is a rendezvous point where exactly two threads meet and exchange objects. When one thread calls exchange(), it blocks until another thread calls exchange(), then they swap their objects and both return.
The following diagram illustrates how two threads exchange buffers at a rendezvous point.
Here is a practical example showing a double-buffering pattern for efficient data processing.
Notice how both threads call exchange() with their current buffer. The first to call blocks until the second arrives. Then they swap and both continue. This creates a natural synchronization rhythm.
Timeout support: You can specify a timeout to avoid waiting forever if the other thread fails to arrive.
Use cases:
Limitations:
When to use Exchanger vs BlockingQueue:
Now that we have covered individual concurrent utilities, let us look at how they work together. Real-world concurrent systems often combine multiple utilities to achieve their goals. Understanding these patterns helps you design better concurrent applications.
This pattern combines rate limiting with task execution. The semaphore limits concurrent external calls while the executor manages thread resources.
This combination lets you have many threads (for CPU work) while limiting how many call an external API simultaneously.
This pattern ensures expensive computations happen only once, even under concurrent requests.
The first thread to request a key creates a FutureTask and runs it. Concurrent requests for the same key get the same Future and wait for the result.
This pattern coordinates multiple workers that process data in phases, aggregating results between phases.
Workers compute independently, store results in the concurrent map, then synchronize at the barrier where results are aggregated.
Warning: Queues + Barriers can cause deadlock. When threads waiting at a barrier also wait on a blocking queue, you can create a deadlock where no thread can proceed.
The problem: If Thread 1 processes a task and waits at the barrier, but Thread 2 and Thread 3 are blocked on queue.take() because there are no tasks, the barrier never trips. Thread 1 waits forever.
Poll with timeout instead of blocking take:
Separate work fetching from barrier synchronization:
Use Phaser with dynamic registration instead:
Other dangerous patterns:
| Scenario | Recommended Combination |
|---|---|
| Rate-limited API calls from thread pool | ThreadPoolExecutor + Semaphore |
| Cache with guaranteed single computation | ConcurrentHashMap + FutureTask |
| Phased batch processing | CyclicBarrier + ConcurrentHashMap |
| Service initialization | CountDownLatch + ExecutorService |
| Producer-consumer with flow control | BlockingQueue + ThreadPoolExecutor |
| Dynamic parallel pipeline | Phaser + ConcurrentLinkedQueue |
| Double buffering | Exchanger (standalone) |