Last Updated: February 5, 2026
A livelock happens when threads are not blocked, but the system still makes no progress. Instead of waiting, the threads keep reacting to each other in a way that prevents either from completing work. CPU usage can be high, logs can be noisy, and yet nothing useful finishes.
Two people meet in a narrow hallway. Both step aside to let the other pass. They step in the same direction. They step aside again, same direction. And again.
Both are actively trying to be polite, both are moving, but neither makes any progress down the hallway.
In many ways, livelock is more insidious than deadlock. With deadlock, you notice immediately because everything stops. With livelock, the system looks busy. Threads are executing, network requests are flying, logs are filling up with retry messages. It might take hours before someone realizes that despite all this activity, no actual work is completing.
Livelock occurs when threads are not blocked but continuously change state in response to each other without making progress. Unlike deadlock where threads are stuck waiting, in livelock threads are actively running but trapped in an unproductive loop.
The key distinction:
| Aspect | Deadlock | Livelock |
|---|---|---|
| Thread state | BLOCKED (waiting) | RUNNING (executing) |
| CPU usage | Near zero | High (often 100%) |
| Activity | None | Lots of retries, state changes |
| Detection | Thread dumps show blocked threads | Thread dumps show running threads |
| Root cause | Circular wait for locks | Overly reactive retry logic |
Livelock typically emerges from code designed to be helpful. Retry logic, conflict resolution, and polite backoff can all cause livelock when they interact badly.
Both participants back off under the same conditions, using the same strategy.
The problem: both threads are too accommodating. Neither asserts priority or breaks the symmetry.
When multiple components retry at fixed intervals, they stay synchronized.
Without randomization, the retry storm repeats forever at the same interval.
Network protocols like Ethernet's CSMA/CD and distributed systems often use backoff after collisions. Without randomness, collisions keep happening.
When many threads wait for a resource and all wake up simultaneously when it becomes available, they all compete, most fail, and all go back to waiting. The cycle repeats.
Livelock is harder to detect than deadlock because the system looks active. Here are the symptoms:
High CPU with no throughput: CPU pegged at 100%, but requests per second is zero or near zero. Work is being done, but it's all overhead.
Retry logs exploding: Log files fill with retry messages. You see patterns like "Retry attempt 1... Retry attempt 2... Retry attempt 3..." repeating infinitely.
Metrics show activity but no completion: Queue depth grows, in-flight requests increase, but completed requests flatline.
Patterns in timing: If you plot retry attempts, you might see synchronization: many retries at exactly the same timestamps.
The key to preventing livelock is breaking synchronization between competing threads. Backoff strategies determine how long to wait before retrying.
Problem: If two threads start at the same time, they stay synchronized. Thread A waits 100ms, Thread B waits 100ms. Both retry at the same time. Both wait 200ms. Same problem.
Problem: Slightly better because waits grow faster, but still synchronized if threads start together. Also, waits can grow very large very quickly.
Why it works: The randomness breaks synchronization. Even if two threads start at exactly the same time and fail at exactly the same time, their random waits will differ. One will retry first and likely succeed before the other interferes.
There are several ways to add jitter:
Full Jitter: wait = random(0, base * 2^attempt)
Most aggressive randomization. Wait time is completely random within the range.
Equal Jitter: wait = base * 2^attempt / 2 + random(0, base * 2^attempt / 2)
Half deterministic, half random. Provides a minimum wait while still randomizing.
Decorrelated Jitter: wait = min(cap, random(base, previous_wait * 3))
Each wait depends on the previous wait, creating more variation over time.
The simplest fix: add random jitter to all retry delays.
Give different threads different roles or priorities. Not everyone should back off equally.
Threads with higher priority back off less, making them more likely to acquire locks first. This breaks the symmetry that causes livelock.
Don't retry forever. Set a maximum number of retries and fail gracefully.
If operations keep failing, stop trying for a while. Let the system recover.
When the circuit is Open, all operations fail immediately without trying. This gives the system time to recover and prevents retry storms from making things worse.