Last Updated: January 12, 2026
Every second, servers emit metrics. Every minute, IoT sensors send readings. Every millisecond, stock prices update. Every day, applications generate billions of log entries.
All of this data shares a common characteristic: it is timestamped, and time is the primary dimension for querying it.
This is time-series data, and it has unique properties that make general-purpose databases inefficient:
Time-series databases are built specifically for these patterns. They use columnar storage for efficient aggregations, automatic data compression, and time-based partitioning that makes retention policies trivial to implement.
The result is databases that can ingest millions of data points per second while providing fast analytical queries.
Time-series data consists of observations recorded at specific points in time. Each observation typically includes:
| Property | Description |
|---|---|
| Immutable | Once written, data points are rarely modified |
| Sequential | Data arrives in time order (mostly) |
| High volume | Thousands to millions of points per second |
| Time-centric queries | Filters almost always include time range |
| Aggregation-heavy | Averages, sums, percentiles more common than point lookups |
| Compressible | Sequential data compresses well |
| Time-decaying value | Recent data is accessed more frequently |
Most time-series databases use a model with these concepts:
Series: A unique combination of metric name and tag set. For example, cpu.usage{host=server01, region=us-east} is one series.
Point: A timestamp-value pair within a series.
Field: The actual measured values (can have multiple fields per point).
This model allows efficient querying by time range within a series and aggregation across series that share tags.
Time-series databases use specialized storage engines optimized for sequential, time-ordered data.
Unlike row-oriented databases that store all columns of a row together, columnar storage stores each column separately:
Why columnar is better for time-series:
Data is partitioned by time, typically into chunks covering fixed time intervals:
Benefits of time-based partitioning:
Time-series data compresses exceptionally well:
| Technique | Used For | Compression Ratio |
|---|---|---|
| Delta encoding | Timestamps (sequential) | 10-50x |
| XOR encoding | Floating-point values | 10-20x |
| Run-length encoding | Repeated values | Variable |
| Dictionary encoding | Tags with low cardinality | 10-100x |
| Block compression | General (LZ4, Zstd) | 2-10x |
Delta encoding example:
XOR encoding for floats: Consecutive floating-point values often share many bits. XOR encoding stores only the differing bits.
Time-series databases optimize for high-volume writes:
This approach provides very high write throughput because:
Time-series queries have distinctive patterns that databases optimize for.
Almost every query includes a time-range predicate:
Time-series databases optimize these by:
Aggregating values over time windows is the most common operation:
| Function | Use Case |
|---|---|
avg() | Average CPU usage |
max(), min() | Peak memory, lowest temperature |
sum() | Total requests, bytes transferred |
count() | Number of events |
percentile() | P99 latency |
rate() | Requests per second (from counter) |
derivative() | Rate of change |
Aggregate old data into lower-resolution summaries:
Filter by metadata before aggregating:
Time-series databases index tags for efficient filtering, typically using inverted indexes or bitmap indexes.
Managing data lifecycle is crucial for time-series databases.
Define how long to keep data at each resolution:
Why tiered retention:
Automatically maintain rollup tables as data arrives:
Move old data to cheaper storage:
| Tier | Storage | Use Case |
|---|---|---|
| Hot | SSD, Memory | Recent data, frequent queries |
| Warm | HDD | Older data, occasional queries |
| Cold | Object storage (S3) | Archive, rare access |
| Frozen | Tape, Glacier | Long-term compliance |
Purpose-built time-series database with its own query language:
Open-source monitoring system with built-in time-series database:
PostgreSQL extension for time-series data:
High-performance time-series database focused on speed:
| Feature | InfluxDB | Prometheus | TimescaleDB | QuestDB |
|---|---|---|---|---|
| Query language | InfluxQL/Flux | PromQL | SQL | SQL |
| Write performance | High | Moderate | High | Very high |
| Query performance | High | High | High | Very high |
| Relational joins | No | No | Yes | Limited |
| Managed options | InfluxDB Cloud | Various | Timescale Cloud | QuestDB Cloud |
| Clustering | Enterprise | Thanos/Cortex | Enterprise | Planned |
| Best for | General TSDB | Monitoring | SQL compatibility | Speed |
Track server and application metrics:
Typical metrics:
Collect and analyze sensor readings:
Typical queries:
Track stock prices, trades, and market data:
Requirements:
Track user behavior and business metrics:
Typical queries:
Cardinality is the number of unique series (unique combinations of metric + tags). High cardinality is the most common performance problem:
| Cardinality Level | Example | Impact |
|---|---|---|
| Low | 100 servers, 50 metrics = 5,000 series | No problem |
| Medium | 10,000 containers, 100 metrics = 1M series | Manageable |
| High | 1M users, 50 metrics = 50M series | Problematic |
| Extreme | Using user ID as tag | Avoid! |
Why high cardinality hurts:
Mitigation strategies:
| Factor | Impact | Tuning |
|---|---|---|
| Batch size | Larger = fewer I/O ops | 1,000-10,000 points per batch |
| Buffer size | Larger = more memory, fewer flushes | Tune based on RAM |
| Compression | Slower writes, smaller storage | Usually worth it |
| Replication | Durability vs speed | Async if possible |
| Factor | Impact | Tuning |
|---|---|---|
| Time range | Wider = more data | Use shortest range needed |
| Tag filters | Selective = faster | Filter early, reduce series |
| Aggregation | Computed on read | Use pre-computed rollups for common queries |
| Concurrency | Resource contention | Limit concurrent queries |
Time-series databases are the right choice when:
Time-series databases may not fit when:
Time-series databases are optimized for timestamped data with predictable access patterns:
| Aspect | Time-Series Approach |
|---|---|
| Data model | Metrics with tags and timestamps |
| Storage | Columnar, time-partitioned, heavily compressed |
| Writes | Append-only, high throughput |
| Queries | Time-range filters, aggregations |
| Retention | Time-based expiration, downsampling |
The next chapter explores full-text search engines, which optimize for a different kind of search: keyword-based search with relevance ranking, faceted filtering, and linguistic analysis.