Last Updated: June 8, 2026
A cascading failure happens when one service’s problem spreads through the dependency graph.
A slow downstream makes its callers wait, those callers slow down their callers, and eventually unrelated user-facing services start failing too. Retries, blocked threads, and shared resource pools can make the spread much worse.
This chapter covers how cascades start, how slowness propagates, why retries amplify failures, and how timeouts, circuit breakers, bulkheads, load shedding, and graceful degradation keep one failure from becoming a full outage.