Last Updated: January 7, 2026
Every developer can write logs. Add a print statement, output some text, and move on. But there is a difference between writing logs and writing useful logs, logs that actually help you debug problems at 3 AM when the system is on fire.
Poor logging is surprisingly common. Logs that say "Error occurred" without context. Logs buried in millions of lines of debug noise. Logs that leak passwords or credit card numbers. Logs formatted differently across services, making correlation impossible.
Good logging is a skill. It requires thinking about what information you will need when something goes wrong, not when you are writing the code, but six months later when you are debugging a production incident you have never seen before.
In this chapter, you will learn:
These practices apply whether you are using Log4j, Logback, Python's logging module, or any other logging framework. The principles are universal.
Log levels categorize messages by importance. Using them well is the foundation of useful logging, because levels decide what gets stored, what gets alerted on, and what gets ignored.
The most common mistake is overusing ERROR and underusing WARN.
Bad Example: Logging expected conditions as ERROR
Ask yourself: “Should this wake someone up at 2 AM?”
ERROR logs should be actionable. If you cannot do anything about it, it probably should not be an ERROR.
A log message is only useful if it contains enough information to understand what happened. The goal is simple: someone should be able to read one log line and immediately know what it means, what it affects, and what to do next.
Every log entry should answer these questions:
Before writing a log statement, ask: “If I saw only this log line, would I understand what happened?”
| Context Type | Examples | When to Include |
|---|---|---|
| Identifiers | user_id, order_id, request_id | Always |
| Values | amount, count, size | When relevant to the event |
| State | status, current step, retry count | For state changes |
| Error details | error code, exception type, message | All errors |
| Timing | duration, timeout value | Performance-related events |
| Source | URL, IP address, service name | External interactions |
Too little context makes logs useless:
Too much context creates noise:
Just right:
Include what you need to debug, nothing more.
Structured logging means writing logs in a machine-parseable format, typically JSON, rather than plain text.
With unstructured logs, you end up treating logs like text files:
With structured logs, logs behave like data:
Parsing this requires regex. Different log formats require different regex patterns. Slight format changes break your parsers.
Now you can query:
event=order_placed AND amount>100 - Large ordersuser_id=789 - All activity for this userservice=order-service AND level=ERROR - Order service errorsUse consistent field names across all services:
| Field | Type | Description | Required |
|---|---|---|---|
timestamp | ISO 8601 | When the event occurred | Yes |
level | string | Log level (INFO, ERROR, etc.) | Yes |
service | string | Name of the service | Yes |
event | string | What happened (snake_case) | Yes |
message | string | Human-readable description | Optional |
trace_id | string | Distributed trace ID | When available |
request_id | string | Request correlation ID | When available |
user_id | string | User identifier | When relevant |
error_code | string | Error classification | For errors |
duration_ms | number | Operation duration | For timed operations |
Most languages have good structured logging support. The main idea is the same: log an event name and attach key-value context.
Once your logs are structured, you can filter, group, and correlate across services without fighting formatting.
One of the biggest logging risks is accidentally exposing sensitive information. Logs get shipped to central systems, copied into tickets, and shared across teams. If a secret lands in logs, assume it will leak.
Never log these directly:
Log carefully (often mask or avoid):
When you need to reference sensitive data for debugging, log the minimum needed for correlation.
Do not rely on developers remembering to redact every time. Add sanitization to the logging pipeline so it happens by default.
Typical controls:
password, token, authorization, api_keyPatterns to detect and mask:
\b\d{4}[\s-]?\d{4}[\s-]?\d{4}[\s-]?\d{4}\b(api[_-]?key|secret)[=:]\s*\w+password=\w+Bearer\s+\w+Logging is not free. In high-throughput systems, even “small” logging overhead can become a real performance and cost problem.
A typical log line can trigger multiple steps:
The expensive part is almost always I/O, especially if it happens on the request thread.
Do not compute expensive values if the log will not be written.
Bad (always computes expensive data):
Good (only computes if debug is enabled):
Better (let the framework defer evaluation when supported):
Synchronous logging can block request processing while waiting for disk or network.
Synchronous logging
Asynchronous logging
Async logging queues log messages and writes them in a background thread. This prevents I/O from blocking request processing.
| Environment | Recommended Level | Reason |
|---|---|---|
| Development | DEBUG or TRACE | Full visibility for debugging |
| Staging | DEBUG | Test with production-like logging |
| Production | INFO | Balance visibility and performance |
| Production (incident) | DEBUG | Temporarily enable for debugging |
A common practice is “INFO by default, DEBUG on demand,” with a time limit and scope (specific service, endpoint, or user) so you do not drown in noise.
For events that happen constantly (cache hits, heartbeats), log only a sample.
Probabilistic sampling (1%):
Deterministic sampling (every 1000th event):
If you want accurate counts, do not rely on logs for that. Use metrics (counters, histograms) and keep logs for context.
These numbers vary by language, hardware, and logging pipeline, but they help build intuition:
| Scenario | Impact |
|---|---|
| Sync logging to disk | 1-10ms per log (blocking) |
| Async logging to disk | <0.1ms per log (non-blocking) |
| Logging over network | 5-50ms per log if sync |
| JSON serialization | 0.01-0.1ms per log |
| String formatting | 0.001-0.01ms per log |
At 10,000 requests/sec, even 0.1ms of extra overhead per request adds up fast. That is roughly 1 second of CPU time per second, just for logging work.
The goal is not “log less.” It is “log smarter”: correct levels, structured context, async I/O, and sampling where needed.
Avoid these patterns that make logs less useful:
Logs consume disk space. If you do not manage them, they will eventually fill the disk and take your service down in the most avoidable way.
Rotation means closing the current log file and starting a new one on a schedule.
The most common strategies are:
| Strategy | Configuration | Use Case |
|---|---|---|
| Size-based | Rotate when file reaches 100MB | Consistent file sizes |
| Time-based | Rotate daily at midnight | Predictable log files |
| Combined | Rotate daily OR at 100MB | Best of both |
After rotation, it is common to:
Retention depends on what the logs are used for and whether compliance applies.
| Log Type | Retention | Reason |
|---|---|---|
| Application logs | 7-30 days | Debugging recent issues |
| Access logs | 30-90 days | Traffic analysis, security |
| Audit logs | 1-7 years | Compliance requirements |
| Security logs | 1-7 years | Incident investigation |
| Debug logs | 1-7 days | Short-term debugging |
Tip: keep long-term logs in cheaper storage (object storage) and keep hot logs in the logging system for fast search.
Effective logging requires intentional design:
Key practices: