Last Updated: February 5, 2026
Imagine you're managing a parking lot with 50 spaces. Cars can enter freely as long as spaces are available.
As the lot fills up, the sign at the entrance updates: “45 spaces available”, “12 spaces available”, and eventually “LOT FULL.” Once it is full, new cars cannot enter. They wait at the gate until a car leaves and a space opens up.
This is exactly how a semaphore works.
A semaphore maintains a counter of available permits. Each thread must acquire a permit before it can proceed. When the thread finishes, it releases the permit back to the semaphore. If the counter reaches zero, any new thread that asks for a permit must wait until another thread releases one.
Loading simulation...
This is the key difference from a mutex:
This makes semaphores the right tool when you need to limit concurrent access to a pool of resources, like database connections, API rate limits, or bounded buffers.
In this chapter, we'll explore how semaphores work internally, understand the difference between binary and counting semaphores, and see practical implementations.
A semaphore is a synchronization primitive built around a simple idea: maintain an integer count representing available "permits," and provide atomic operations to acquire and release those permits.
The best way to understand this is through a concrete analogy.
Picture a nightclub with a strict capacity limit of 100 people. A bouncer stands at the door with a clicker counter. Every time someone enters, the bouncer clicks "in" (decrementing the count). Every time someone leaves, the bouncer clicks "out" (incrementing the count).
When the count hits zero, new guests must wait in line outside until someone leaves and frees up a spot.
The diagram below shows this in action. Three threads have successfully acquired permits from a semaphore initialized with 3 permits. Thread 4 arrives and wants to acquire, but no permits remain, so it must wait until one of the other threads releases.
When Thread 1, 2, or 3 finishes its work and calls release(), the permit returns to the pool. At that point, Thread 4 wakes up, acquires the newly available permit, and proceeds.
Formally, a semaphore supports two atomic operations:
The letters P and V come from Dutch: "proberen" (to try/test) and "verhogen" (to increment). These names come from Edsger Dijkstra's original 1965 paper that introduced semaphores as a synchronization primitive. You'll see all three naming conventions (acquire/release, wait/signal, P/V) used in different languages and textbooks, but they all mean the same thing.
The key insight is that both operations are atomic. Between checking the counter and modifying it, no other thread can intervene. This atomicity is what makes semaphores safe for coordination.
Semaphores come in two flavors, distinguished by how many permits they manage.
A binary semaphore has only two states: available (1) or taken (0). When a thread acquires the single permit, the semaphore transitions to the "taken" state. When that thread (or any thread) releases, it transitions back to "available."
The state diagram below illustrates these two states and the transitions between them:
Starting in the "Available" state, a thread calling acquire() moves the semaphore to "Taken." While taken, any other thread calling acquire() will block. When any thread calls release(), the semaphore returns to "Available" and a waiting thread (if any) can proceed.
Binary semaphores can be used for mutual exclusion (only one thread at a time), but as we'll see later, they differ from mutexes in important ways.
A counting semaphore allows up to N threads to hold permits simultaneously. As threads acquire permits, the count decreases. When the count reaches zero, additional threads must wait.
The state diagram below shows a counting semaphore initialized with 3 permits:
Following the diagram from left to right: the semaphore starts with 3 permits (green). Each acquire() decrements the count. After three acquires, we reach zero (red). A fourth acquire() enters the "Waiting" state, blocked until a release() occurs. When a thread releases, the count increments. If threads are waiting, one is woken to acquire the newly available permit.
The choice between binary and counting semaphores depends on what you're trying to accomplish:
| Type | Permits | Best For |
|---|---|---|
| Binary | 1 | Signaling between threads, simple synchronization gates |
| Counting | N | Connection pools, rate limiting, bounded buffers, limiting concurrent operations |
Binary semaphores shine when you need one thread to signal another that something has happened, like "data is ready" or "initialization complete." Counting semaphores shine when you have a pool of interchangeable resources and want to limit how many threads can use them simultaneously.
Semaphores are worth learning for three practical reasons. They solve real engineering problems, they unlock common coordination patterns, and they show up frequently in technical interviews.
At a high level, a semaphore is a tool for controlling how many things can happen at the same time. That makes it useful in two broad categories: resource limiting and producer-consumer coordination.
The most common use of counting semaphores is limiting access to a pool of limited resources. Consider these scenarios:
Database connections: Your application can hold at most 20 database connections. If 50 threads each try to open a connection simultaneously, you'll exhaust the pool and crash. A semaphore with 20 permits ensures only 20 threads can hold connections at once. The other 30 wait their turn.
Thread pool sizing: When processing tasks in parallel, you often want to limit concurrency to match your CPU cores. A semaphore with 8 permits ensures at most 8 tasks run simultaneously, preventing CPU thrashing from excessive context switching.
API rate limiting: External APIs often limit you to N requests per second. A semaphore helps enforce this limit locally before requests even leave your system, preventing expensive API errors.
File handles: Operating systems limit how many files a process can open simultaneously. A semaphore protects against hitting this limit when processing many files concurrently.
Without semaphores in these scenarios, you risk resource exhaustion, cascading failures, and degraded performance under load.
Semaphores excel at coordinating producers and consumers with a bounded buffer between them. This is one of the classic problems in concurrent programming:
Two semaphores solve this elegantly: one tracks empty slots (so producers know when they can add), and one tracks filled slots (so consumers know when they can take). We'll implement this pattern in detail later in the chapter.
Now that you understand what semaphores are for, let's look at how they actually work under the hood.
When a thread calls acquire() or release(), the semaphore performs a specific sequence of steps. Understanding this sequence helps you reason about semaphore behavior and avoid mistakes.
Notice that release() never blocks. It always completes immediately. Only acquire() can block, and only when no permits are available.
Beyond the basic acquire and release, most semaphore implementations provide additional operations for flexibility:
| Operation | What It Does | Blocks? |
|---|---|---|
acquire() / wait() / P() | Acquire one permit, blocking if none available | Yes |
release() / signal() / V() | Release one permit, waking a waiter if any | No |
tryAcquire() | Try to acquire immediately; return true/false | No |
tryAcquire(timeout) | Try to acquire, waiting up to timeout | Up to timeout |
availablePermits() | Get current count (may be stale immediately) | No |
The tryAcquire() variants are useful when you want to fall back to alternative behavior if permits aren't available, rather than blocking indefinitely.
Let's trace through a concrete scenario to see how acquire and release interact. We have a semaphore with 2 permits and three threads that want to use it. Follow along with the sequence diagram below:
Walking through this step by step:
acquire(), the count drops to 1, and Thread 1 proceeds immediately.acquire(), the count drops to 0, and Thread 2 proceeds immediately.acquire(), but the count is 0. Thread 3 cannot proceed; it gets added to the wait queue and goes to sleep.release(). The count would increment to 1, but there's a waiter (Thread 3). The semaphore wakes Thread 3, which immediately acquires the permit, bringing the count back to 0. Thread 3 now proceeds with its work.release(). No waiters exist, so the count simply increments to 1.release(). The count increments to 2, returning to the initial state.This sequence demonstrates the key behaviors: immediate acquisition when permits are available, blocking when they're not, and automatic waking of waiters when permits are released.
Let's see how to use semaphores in real code. We'll cover basic usage patterns, then build a complete connection pool example.
Java provides java.util.concurrent.Semaphore with a clean, well-designed API:
The key pattern here is acquire() followed by a try-finally block that guarantees release(). This pattern prevents permit leaks even when exceptions occur.
A database connection pool is the classic semaphore use case. You have a fixed number of expensive connections, and many threads want to use them. The semaphore ensures no more threads use connections than exist in the pool.
The design works like this: we pre-create N connections and store them in a queue. A semaphore with N permits guards access. To get a connection, a thread acquires a permit (blocking if all connections are in use), then takes a connection from the queue. To return a connection, it puts the connection back in the queue and releases the permit.
Notice how connections 0, 1, and 2 are reused. The semaphore ensures only 3 tasks run concurrently, while the other 7 wait their turn.
This comparison comes up frequently in interviews. While a binary semaphore (permits=1) and a mutex can both limit access to one thread at a time, they have fundamental differences that matter in practice.
A simple mental model:
| Aspect | Mutex | Semaphore |
|---|---|---|
| Ownership | Yes - only the thread that locked can unlock | No - any thread can signal |
| Count | Always 1 (binary) | Can be any N (counting) |
| Primary purpose | Protecting critical sections | Resource counting and signaling |
| Priority inheritance | Usually supported | Usually not supported |
| Release by different thread | Error or undefined behavior | Perfectly valid |
The ownership difference has significant practical implications.
With a mutex, the thread that locks it must be the one to unlock it. This enables:
With a semaphore, any thread can call release(). This enables:
Here's code illustrating both patterns:
In the signaling pattern, notice that Thread A releases without ever acquiring, and Thread B acquires without ever releasing. This would be an error with a mutex, but it's the correct pattern for semaphore-based signaling.
Semaphores are heavily optimized, but they still have a cost, especially under contention. The key idea is similar to mutexes:
| Operation | Typical Latency | What Happens |
|---|---|---|
| Uncontended acquire/release | 20-100 nanoseconds | Fast atomic operations on the counter |
| Contended acquire (no wait) | 100-500 nanoseconds | Atomic operation fails, retry loop |
| Contended acquire (must wait) | 1-10 microseconds | Thread sleeps, later wakes up (context switch) |
| Timeout operations | Additional overhead | Timer management adds latency |
Semaphores are heavier than raw atomics because they may maintain a wait queue and manage wakeups. But they are often simpler and cheaper than building the same behavior with ad-hoc condition variables.
| Scenario | Best Choice | Why |
|---|---|---|
| Mutual exclusion | Mutex/Lock | Ownership, priority inheritance, lower overhead |
| Limit to exactly N | Counting Semaphore | Built for this exact purpose |
| Atomic counter (no blocking) | AtomicInteger | Much lower overhead when blocking isn't needed |
| One-time "wait for ready" | CountDownLatch | Simpler semantics for one-shot events |
| Cyclical barrier | CyclicBarrier | Reusable barrier with simpler API |
| Precise rate limiting | Token bucket | Better burst handling, smoother rate control |
A useful rule: if your intent is “only one thread can enter,” reach for a mutex. If your intent is “only N threads can enter” or “wake someone when ready,” reach for a semaphore.
Too few permits causes unnecessary blocking and underutilization. Too many defeats the purpose of limiting. For connection pools, match the count to the actual pool size. For CPU-bound work, match the count to CPU cores.
Fair semaphores guarantee FIFO ordering but have lower throughput. Non-fair semaphores allow "barging" (newly arriving threads can acquire before waiting threads) which improves throughput but can cause starvation.
Use fair semaphores when wait times are critical (user-facing operations) or starvation is a concern (high contention with many threads). Use non-fair semaphores (the default) when throughput matters more than individual latency.
When you have an alternative to blocking, use non-blocking acquisition:
This is useful for graceful degradation: try the fast path, fall back to a slower alternative if resources are unavailable.
What is a semaphore?