Last Updated: May 25, 2026
Metrics are numerical measurements collected over time that show how a system is performing overall.
In production systems, metrics reveal response time trends, error rates, request volume, saturation, and capacity signals across thousands or millions of requests. They power dashboards, drive alerts, and support capacity planning.
In this chapter, you will learn how metric types, golden signals, instrumentation, naming, cardinality, and Prometheus-style infrastructure fit together.
To understand why metrics are indispensable, consider an e-commerce platform during a flash sale.
Logs show individual events: order 123 completed in 250ms, order 124 completed in 280ms, order 125 completed in 310ms, and so on across tens of thousands of orders.
Metrics summarize the shape of the workload: request rate at 2,500/sec, p99 latency at 450ms, error rate at 0.3%, and CPU at 78%.
Logs show individual events. To understand the flash sale's impact, you would need to aggregate 50,000 log entries. Metrics give you instant visibility: request rate tripled, latency increased by 40%, error rate is still acceptable, CPU is climbing.
| Aspect | Metrics | Logs |
|---|---|---|
| Data type | Numeric time series | Text events |
| Question answered | How much? How many? | What happened? |
| Storage efficiency | Very efficient (numbers) | Less efficient (text) |
| Query style | Aggregate, graph | Search, filter |
| Retention | Months to years | Days to weeks |
| Alerting | Primary use case | Secondary use case |
| Debugging | Find the problem | Understand the problem |
Both are essential. Metrics alert you that something is wrong. Logs and traces help you understand why. Think of metrics as the vital signs monitor in a hospital: it tells doctors instantly when something needs attention, but they still need tests and exams to diagnose the cause.