Last Updated: June 8, 2026
Traces and logs help you investigate known problems. Metrics help you discover problems and monitor system health continuously.
A metric is a number tracked over time, such as request rate, error rate, p99 latency, or memory usage. The challenge is not collecting metrics, but choosing the ones that define real health and avoid noisy alerts.
This chapter covers the four golden signals, RED and USE methods, health checks, SLOs, Prometheus, Grafana, and symptom-based alerting.