Practice this topic in a realistic system design interview
Alerting usually fails in two ways.
Sometimes it stays quiet while something is broken, and you only find out when customers start complaining. Other times, it sends so many noisy alerts that the on-call engineer stops paying attention.
Good alerting sits between those extremes. It turns important signals into a notification only when the system actually needs human attention.
The hard part is deciding what deserves an alert. Too few alerts, and real problems slip through. Too many, and people tune them out.
This chapter covers how to design useful alerts, send them to the right people, reduce noise, and build a healthier on-call process.