Last Updated: January 6, 2026
In the previous chapter, we learned how to scale systems to handle growing load. But scaling solves only half the problem. What good is a system that can handle millions of requests if it crashes when a single server fails?
This is where availability comes in.
Availability measures how often your system is operational and accessible to users. A highly available system continues functioning even when individual components fail.
One important distinction before we begin: availability is not the same as reliability. A system can be highly available (always up) but unreliable (sometimes gives wrong answers). We will explore reliability in the next chapter, but keep this distinction in mind.
Availability is typically expressed as a percentage of uptime over a given period. The formula is straightforward:
Availability = Uptime / (Uptime + Downtime)
For example, if a system was up for 364 days and down for 1 day in a year:
Availability = 364 / 365 = 99.73%
That single day of downtime drops you below "three nines" availability. When you frame it as "one day," it sounds acceptable.
Availability is often described in terms of "nines." Each additional nine dramatically reduces allowed downtime:
How you combine components dramatically affects overall availability.
When components are in series, meaning all must work for the system to function, availability multiplies:
Overall = 99.9% × 99.9% × 99.9% = 99.7%
Each component in the chain reduces overall availability. You started with three components, each at "three nines," but the combined system is below three nines. Add more components in series, and availability keeps dropping.
When components are in parallel, meaning any can handle the request, availability improves dramatically:
For both servers to be down simultaneously, both must fail at the same time:
Failure probability = 0.1% × 0.1% = 0.0001%
Availability = 100% - 0.0001% = 99.9999%
Two servers with 99.9% availability each give you nearly six nines when running in parallel. This is the power of redundancy.
To design for availability, you must understand how things fail. Failures do not ask permission, and they rarely happen at convenient times. Knowing the common failure modes helps you prepare for them.
Everything physical eventually breaks. The question is when, not if.
MTBF = Mean Time Between Failures.
At scale, hardware failures are not exceptional events. They are routine. A data center with 10,000 servers will see hundreds of hardware failures per year. If your architecture cannot handle a server dying at any moment, you do not have a highly available system.
Hardware breaks randomly. Software breaks creatively.
Networks fail in ways that are subtle, intermittent, and painful to debug.
Here’s the uncomfortable reality: many studies attribute 70–80% of outages to human error, not hardware, not software, but people.
Common examples:
This is why automation, testing, and guardrails matter so much. Humans make mistakes. Good systems make those mistakes hard to make and easy to recover from.
If there is one concept that underpins all of availability, it is redundancy. The logic is simple: if you have only one of something, when it fails, you have zero. If you have two, when one fails, you still have one.
Redundancy means having backup components that can take over when primary components fail.
In an active-passive configuration, one component handles all the work while another waits idle as a backup. When the active component fails, the passive one takes over.
Active-passive mode is commonly used in situations where you want a single source of truth and controlled writes like databases, stateful services, and systems requiring a single leader.
The standby can be configured in different states of readiness:
Cold standby is cheapest but slowest. The backup server is not running, so failover requires booting the machine, starting services, and potentially restoring data. This might take 5-15 minutes, which is too slow for most production systems but acceptable for disaster recovery.
Warm standby keeps the backup running and configured, but not actively processing requests. It might be receiving replicated data but is not in the load balancer pool. Failover involves adding it to the pool and possibly promoting it, which takes seconds to a few minutes.
Hot standby is the most expensive but fastest. The backup is fully synchronized and ready to serve immediately. For databases, this often means synchronous replication where every write is confirmed on both primary and standby before acknowledging the client.
In an active-active configuration, all components handle traffic simultaneously. There is no distinction between primary and backup because every node is doing real work.
When one node fails, the load balancer simply stops sending traffic to it. There is no failover process because the other nodes were already handling traffic. The remaining nodes absorb the additional load.
The key requirement for active-active is that requests can be handled by any node. This works naturally for stateless services where each request is independent. For stateful services, you need either shared storage (like a database or Redis) or sticky sessions (which reduces availability benefits).
Redundancy within a single data center protects against hardware failures, but what if the entire data center goes offline? Power outages, network cuts, natural disasters, or even a backhoe cutting a fiber line can take down an entire facility.
Geographic redundancy distributes your system across multiple physical locations:
Cloud providers offer different levels of geographic redundancy:
Availability Zones are the sweet spot for most applications. They provide meaningful isolation (separate power, cooling, and network) while keeping latency low enough for synchronous replication. Most cloud-native applications deploy across at least two AZs.
Multi-region deployment is necessary for global applications or those requiring disaster recovery from regional events. The challenge is data replication, since synchronous replication across regions adds significant latency. Most multi-region systems use asynchronous replication and accept some data loss in a disaster (typically seconds to minutes of transactions)
A chain is only as strong as its weakest link. If you have redundant app servers but a single database, the database is your single point of failure. True high availability requires redundancy at every layer of your stack.
Notice that redundancy gets harder as you move down the stack. Adding more web servers is trivial. Adding database replicas with automatic failover requires careful engineering.
Redundancy is not free. Every backup server, every replica, every additional availability zone costs money. The question is whether that cost is justified by the reduction in downtime risk.
Patterns are reusable solutions to common problems. The following patterns appear repeatedly in highly available systems.
The most common and fundamental pattern for stateless services. A load balancer distributes traffic across multiple servers, automatically routing around failures.
The load balancer itself is a single point of failure. For true high availability, you need redundant load balancers:
Cloud providers handle this automatically. AWS ALB, Google Cloud Load Balancer, and Azure Load Balancer are all managed services with built-in redundancy. On-premises, you might use keepalived with a virtual IP that floats between two HAProxy instances.
Databases are stateful and cannot simply be load-balanced like web servers. Database high availability requires replication and careful failover management.
Synchronous replication guarantees zero data loss but adds latency. Every write must wait for the replica to confirm. If your replica is in a different region, this adds significant latency (50-100ms per write).
Asynchronous replication has no performance impact but can lose data. If the primary fails, any writes not yet replicated are lost. The "replication lag" is typically seconds but can grow during high load.
Most production systems use synchronous replication for the failover target and asynchronous replication for read replicas and analytics.
When downstream services cannot handle peak load, use a queue to buffer requests and process them at a sustainable rate.
This pattern is essential for handling bursty traffic. A flash sale might generate 100x normal traffic for a few minutes. Without a queue, the database would be overwhelmed. With a queue, orders accumulate and are processed at a sustainable rate.
When a dependency fails, continuing to call it wastes resources and can cause cascading failures. The circuit breaker pattern prevents this by failing fast.
A highly available system stays up even when components fail. But availability alone is not enough. A system that is always up but sometimes gives wrong answers is not trustworthy.
This brings us to our next topic: reliability. How do you ensure your system delivers correct results consistently?