Practice this topic in a realistic system design interview
A distributed system often fails through shared resources before it fails through code.
One slow dependency can fill every request thread. One noisy tenant can consume the whole connection pool. One optional feature can build a queue so large that required work waits behind it.
The Bulkhead Pattern prevents one overloaded path from consuming all capacity. It partitions scarce resources into isolated pools so failure stays inside a defined boundary.
The goal is to decide how much capacity one dependency, feature, tenant, or workload can consume before it affects the rest of the system.
Consider an e-commerce backend with checkout, cart, and recommendations handled by the same service. Recommendations call a third-party API. During an incident, that API becomes slow but does not fail immediately.
If all request handlers share one executor and one outbound connection pool, recommendation calls can occupy the same resources needed by checkout.
The failure path looks like this:
The slow recommendation API is only part of the incident. The service allowed an optional path to spend resources required by a high-priority path.