AlgoMaster Logo

Time-Series Databases

Last Updated: January 12, 2026

Ashish

Ashish Pratap Singh

Every second, servers emit metrics. Every minute, IoT sensors send readings. Every millisecond, stock prices update. Every day, applications generate billions of log entries.

All of this data shares a common characteristic: it is timestamped, and time is the primary dimension for querying it.

This is time-series data, and it has unique properties that make general-purpose databases inefficient:

  • Data arrives in time order and is rarely updated after insertion
  • Queries almost always filter by time range
  • Aggregations (averages, percentiles, sums) are more common than individual record lookups
  • Recent data is accessed far more frequently than old data
  • Data often has a natural retention period after which it can be deleted

Time-series databases are built specifically for these patterns. They use columnar storage for efficient aggregations, automatic data compression, and time-based partitioning that makes retention policies trivial to implement.

The result is databases that can ingest millions of data points per second while providing fast analytical queries.

Understanding Time-Series Data

Time-series data consists of observations recorded at specific points in time. Each observation typically includes:

  • Timestamp: When the observation occurred
  • Measurement: The value being recorded (temperature, CPU usage, price)
  • Tags/Labels: Metadata identifying the source (server name, sensor ID, stock symbol)

Characteristics of Time-Series Data

PropertyDescription
ImmutableOnce written, data points are rarely modified
SequentialData arrives in time order (mostly)
High volumeThousands to millions of points per second
Time-centric queriesFilters almost always include time range
Aggregation-heavyAverages, sums, percentiles more common than point lookups
CompressibleSequential data compresses well
Time-decaying valueRecent data is accessed more frequently

Time-Series Data Model

Most time-series databases use a model with these concepts:

Series: A unique combination of metric name and tag set. For example, cpu.usage{host=server01, region=us-east} is one series.

Point: A timestamp-value pair within a series.

Field: The actual measured values (can have multiple fields per point).

This model allows efficient querying by time range within a series and aggregation across series that share tags.

Storage Architecture

Time-series databases use specialized storage engines optimized for sequential, time-ordered data.

Columnar Storage

Unlike row-oriented databases that store all columns of a row together, columnar storage stores each column separately:

Why columnar is better for time-series:

  • Aggregations: Computing AVG(cpu) only reads the cpu column, not host or memory
  • Compression: Values in a column are similar type, enabling better compression
  • Vectorized operations: Modern CPUs can process column chunks efficiently
  • Selective reading: Only load columns needed for the query

Time-Based Partitioning

Data is partitioned by time, typically into chunks covering fixed time intervals:

Benefits of time-based partitioning:

  • Query efficiency: A query for "last hour" only reads recent partitions
  • Retention: Delete old data by dropping entire partitions
  • Hot/cold separation: Recent partitions on SSD, old on HDD or object storage
  • Parallel processing: Different partitions can be queried in parallel

Compression Techniques

Time-series data compresses exceptionally well:

TechniqueUsed ForCompression Ratio
Delta encodingTimestamps (sequential)10-50x
XOR encodingFloating-point values10-20x
Run-length encodingRepeated valuesVariable
Dictionary encodingTags with low cardinality10-100x
Block compressionGeneral (LZ4, Zstd)2-10x

Delta encoding example:

XOR encoding for floats: Consecutive floating-point values often share many bits. XOR encoding stores only the differing bits.

Write Path

Time-series databases optimize for high-volume writes:

  1. Buffer writes in memory: Incoming points go to a write buffer
  2. Write-ahead log: For durability, append to WAL before acknowledging
  3. Batch flush: Periodically flush buffer to compressed storage
  4. Background compaction: Merge small chunks into larger, more optimized files

This approach provides very high write throughput because:

  • Memory writes are fast
  • Batching amortizes I/O overhead
  • Sequential writes to storage are efficient

Query Patterns

Time-series queries have distinctive patterns that databases optimize for.

Time-Range Filters

Almost every query includes a time-range predicate:

Time-series databases optimize these by:

  • Using time-partitioned storage (only read relevant partitions)
  • Indexing by time (fast range scans)
  • Caching recent data (often in memory)

Aggregations

Aggregating values over time windows is the most common operation:

FunctionUse Case
avg()Average CPU usage
max(), min()Peak memory, lowest temperature
sum()Total requests, bytes transferred
count()Number of events
percentile()P99 latency
rate()Requests per second (from counter)
derivative()Rate of change

Downsampling Queries

Aggregate old data into lower-resolution summaries:

Tag-Based Filtering

Filter by metadata before aggregating:

Time-series databases index tags for efficient filtering, typically using inverted indexes or bitmap indexes.

Retention and Downsampling

Managing data lifecycle is crucial for time-series databases.

Retention Policies

Define how long to keep data at each resolution:

Why tiered retention:

  • Recent data needs high resolution for debugging
  • Older data is typically viewed as trends, not individual points
  • Storage costs grow linearly without downsampling
  • Query performance improves with less data

Continuous Aggregation

Automatically maintain rollup tables as data arrives:

Storage Tiering

Move old data to cheaper storage:

TierStorageUse Case
HotSSD, MemoryRecent data, frequent queries
WarmHDDOlder data, occasional queries
ColdObject storage (S3)Archive, rare access
FrozenTape, GlacierLong-term compliance

Popular Time-Series Databases

InfluxDB

Purpose-built time-series database with its own query language:

  • Query language: InfluxQL (SQL-like) or Flux (functional)
  • Data model: Measurements, tags, fields, timestamps
  • Strengths: Developer experience, built-in dashboarding (InfluxDB Cloud)
  • Use cases: DevOps monitoring, IoT, real-time analytics

Prometheus

Open-source monitoring system with built-in time-series database:

  • Query language: PromQL
  • Data model: Metrics with labels
  • Architecture: Pull-based (scrapes targets)
  • Strengths: Kubernetes-native, extensive ecosystem
  • Use cases: Infrastructure monitoring, alerting

TimescaleDB

PostgreSQL extension for time-series data:

  • Query language: SQL (standard PostgreSQL)
  • Data model: Hypertables (auto-partitioned tables)
  • Strengths: Full SQL, joins with relational data, PostgreSQL ecosystem
  • Use cases: When you need time-series + relational in one database

QuestDB

High-performance time-series database focused on speed:

  • Query language: SQL
  • Strengths: Extremely fast ingestion and queries
  • Use cases: Financial data, high-frequency trading, real-time analytics

Comparison

FeatureInfluxDBPrometheusTimescaleDBQuestDB
Query languageInfluxQL/FluxPromQLSQLSQL
Write performanceHighModerateHighVery high
Query performanceHighHighHighVery high
Relational joinsNoNoYesLimited
Managed optionsInfluxDB CloudVariousTimescale CloudQuestDB Cloud
ClusteringEnterpriseThanos/CortexEnterprisePlanned
Best forGeneral TSDBMonitoringSQL compatibilitySpeed

Common Use Cases

Infrastructure Monitoring

Track server and application metrics:

Typical metrics:

  • CPU, memory, disk, network utilization
  • Request rate, latency percentiles, error rates
  • Queue depths, connection counts
  • Custom application metrics

IoT Sensor Data

Collect and analyze sensor readings:

Typical queries:

  • Average temperature by location over the past week
  • Devices exceeding threshold values
  • Trend analysis for predictive maintenance

Financial Data

Track stock prices, trades, and market data:

Requirements:

  • Millisecond or microsecond timestamp precision
  • Very high ingestion rates (thousands of updates per second per symbol)
  • Historical backtesting queries
  • Real-time streaming

Application Analytics

Track user behavior and business metrics:

Typical queries:

  • Daily active users over time
  • Conversion rates by cohort
  • Revenue trends by product category

Performance Considerations

Cardinality

Cardinality is the number of unique series (unique combinations of metric + tags). High cardinality is the most common performance problem:

Cardinality LevelExampleImpact
Low100 servers, 50 metrics = 5,000 seriesNo problem
Medium10,000 containers, 100 metrics = 1M seriesManageable
High1M users, 50 metrics = 50M seriesProblematic
ExtremeUsing user ID as tagAvoid!

Why high cardinality hurts:

  • Each series requires index entries
  • Memory usage grows with series count
  • Query performance degrades

Mitigation strategies:

  • Never use high-cardinality values as tags (user IDs, email addresses)
  • Move high-cardinality data to fields (not indexed)
  • Pre-aggregate where possible

Write Performance Tuning

FactorImpactTuning
Batch sizeLarger = fewer I/O ops1,000-10,000 points per batch
Buffer sizeLarger = more memory, fewer flushesTune based on RAM
CompressionSlower writes, smaller storageUsually worth it
ReplicationDurability vs speedAsync if possible

Query Performance Tuning

FactorImpactTuning
Time rangeWider = more dataUse shortest range needed
Tag filtersSelective = fasterFilter early, reduce series
AggregationComputed on readUse pre-computed rollups for common queries
ConcurrencyResource contentionLimit concurrent queries

When to Choose Time-Series Databases

Time-series databases are the right choice when:

  • Time is the primary query dimension. Every query filters by time range.
  • Data is append-heavy. Writes vastly outnumber updates to existing data.
  • Aggregations dominate. You care about averages, sums, and percentiles, not individual records.
  • Volume is high. Ingesting thousands to millions of points per second.
  • Retention is time-based. Data naturally expires after a certain age.

When to Consider Alternatives

Time-series databases may not fit when:

  • You need complex relationships. Time-series data is flat; use relational for joins.
  • Updates are common. Time-series databases assume data is immutable.
  • Point lookups dominate. Looking up specific records by ID is not the strength.
  • Low volume. For small datasets, a regular relational database with time indexes works fine.

Summary

Time-series databases are optimized for timestamped data with predictable access patterns:

AspectTime-Series Approach
Data modelMetrics with tags and timestamps
StorageColumnar, time-partitioned, heavily compressed
WritesAppend-only, high throughput
QueriesTime-range filters, aggregations
RetentionTime-based expiration, downsampling

Key concepts:

  • Columnar storage: Efficient for aggregations, excellent compression
  • Time partitioning: Query only relevant time ranges, easy retention
  • Cardinality: Number of unique series; keep it manageable
  • Downsampling: Reduce resolution of old data to save storage

The next chapter explores full-text search engines, which optimize for a different kind of search: keyword-based search with relevance ranking, faceted filtering, and linguistic analysis.