Introduction to timeseries databases

Table of Contents

Append-only storage

Time-series databases are heavily optimized for write throughput, and one of the most important design choices behind this optimization is the use of append-only storage. Instead of modifying existing data in place, new records are always written to the end of a log or file. This approach avoids expensive random writes and allows the database to take full advantage of sequential disk access, which is significantly faster on both HDDs and SSDs.

By writing data sequentially, append-only storage improves cache locality, reduces write amplification, and minimizes locking and contention. Since time-series workloads typically involve continuously ingesting new data points in chronological order, this write pattern aligns naturally with the structure of the data itself. As a result, high ingestion rates can be sustained even under heavy load.

However, an important question arises with this design: if all data is written sequentially, how is the data organized and efficiently queried later? Time-series databases address this by layering logical structures on top of the append-only log. Common techniques include partitioning data by time ranges, grouping records into immutable segments or chunks, and building auxiliary indexes or in-memory metadata that map timestamps to physical locations on disk.

This separation between the physical write path and the logical query layout allows the system to optimize for fast writes while still enabling efficient reads, aggregations, and range queries.

Log-structured merge trees

Log-structured merge trees, commonly referred to as LSM trees, are a storage architecture designed to optimize high write throughput by turning random writes into sequential ones. Instead of writing incoming data directly to disk in place, the system first buffers writes in memory and persists them in an append-only manner. This design is particularly well suited for time-series workloads, where data arrives continuously and in large volumes.

When a new write arrives, it is first appended to a write-ahead log. The write-ahead log ensures durability by allowing the system to recover recently written data in the event of a crash. After the write is safely recorded, it is inserted into an in-memory data structure known as the memtable. The memtable acts as a staging area for writes and allows the database to organize data, typically by sorting entries based on keys such as timestamps or time-series identifiers.

Once the memtable reaches a certain size, it is flushed to disk as an immutable segment, often called an SSTable. These segments are written sequentially to disk, preserving the append-only write pattern while converting in-memory structures into durable, on-disk representations. Because the data is already sorted in memory, the resulting segments are also ordered, which enables efficient range scans and time-based queries.

Over time, multiple segments accumulate on disk. To prevent read amplification and reduce storage overhead, the system performs compaction. During compaction, multiple older segments are merged into a new segment that represents the combined and deduplicated data. Once the new segment is written, the system records metadata indicating that the older segments are obsolete and can be safely ignored or removed. This process typically runs in the background, allowing normal read and write operations to continue with minimal disruption.

Log-structured merge trees

The key advantage of the LSM tree model is that it keeps writes fast and sequential while still supporting efficient reads through a layered structure of in-memory and on-disk data. In time-series databases, the memtable is often organized by series and time, allowing related data points to be grouped together before being flushed to disk. This approach makes LSM trees a natural fit for systems that ingest large volumes of ordered, append-only data, such as metrics platforms, logging systems, and event-driven architectures.

Delta encoding and compression

Time-series data typically exhibits strong temporal locality, meaning that consecutive values often change only slightly from one observation to the next. Rather than storing each value in its raw form, databases can exploit this property by storing the difference between consecutive values instead. This technique is known as delta encoding and is a foundational compression strategy in many time-series storage engines.

By encoding values as deltas, the magnitude of the stored numbers is usually much smaller than the original values. Smaller numbers require fewer bytes to represent, especially when combined with variable-length integer encodings. For example, instead of storing an eight-byte floating point value for every data point, the system may only need one or two bytes to store the delta between successive values, significantly reducing storage overhead.

Timestamps benefit even more from this approach. In a typical time-series, timestamps are monotonically increasing and often evenly spaced. Rather than storing each timestamp explicitly, the database can store the delta between timestamps. This can be further optimized using delta-of-deltas, where the difference between successive deltas is encoded. Because these second-order differences are frequently zero or very small, they compress extremely well.

In practice, delta-of-delta encoding allows timestamps to be represented using only a few bits on average, compared to the 32 or 64 bits required to store raw timestamps. When combined with lightweight compression schemes such as run-length encoding or bit-packing, this approach enables time-series databases to achieve very high compression ratios while still preserving fast sequential reads.

DeltaEncoding

Delta encoding fits naturally with append-only and log-structured storage models. Because data is written in time order, the database can apply these compression techniques during ingestion or compaction, minimizing CPU overhead while significantly reducing disk usage and I/O costs. This makes delta-based compression a key building block for scalable and efficient time-series systems.

Block-level metadata

When data is flushed to disk, time-series databases typically persist metadata alongside each block or segment of data. This metadata is designed to accelerate reads by allowing the engine to quickly determine whether a block is relevant to a query, without having to scan or decompress the data itself. Because disk I/O and decompression are expensive, this early filtering step is critical for query performance at scale.

At the file or segment level, the system often maintains an index that maps metric names and tag combinations to the physical locations of their corresponding data blocks. When a query arrives, the engine first evaluates the time range and tag predicates to identify which files could possibly contain relevant data. This allows it to ignore large portions of the data set and restrict reads to only the files that overlap with the requested time window.

Within each selected file, block-level metadata is consulted to further narrow down the search. Instead of scanning every block, the query engine can jump directly to the blocks associated with the requested series. Only those blocks are then read from disk and decoded or decompressed, keeping both I/O and CPU usage low.

TimeseriesMetadata

In addition to indexing information, blocks often store summary statistics such as the minimum and maximum values observed within the block. These statistics enable predicate pushdown at the storage layer. For example, if a query is only interested in trade prices above a certain threshold, any block whose maximum value is below that threshold can be safely skipped without reading its contents.

By combining time range filtering, tag-based indexing, and value-based block statistics, block-level metadata allows time-series databases to turn potentially expensive full scans into highly selective, sequential reads. This design is a key reason why modern time-series systems can support high-cardinality workloads while maintaining low query latency.

Bloom filters

A Bloom filter is a space-efficient probabilistic data structure that behaves similarly to a set, but with different guarantees. While a regular set can tell you definitively whether an element is present or absent, a Bloom filter can only answer one of two things: the element is definitely not present, or it might be present. This trade-off allows Bloom filters to use significantly less memory than traditional hash-based sets, often by an order of magnitude.

Bloom filters achieve this efficiency by hashing each element multiple times and setting bits in a fixed-size bit array. When checking for membership, the same hash functions are applied and the corresponding bits are examined. If any bit is unset, the element is guaranteed not to be in the set. If all bits are set, the element may be present, resulting in a possible false positive but never a false negative.

In time-series databases, Bloom filters are commonly attached to data files or segments and populated with identifiers such as metric names, tag values, or series IDs. When a query includes predicates on these fields, the engine can first consult the Bloom filter to determine whether a file could possibly contain matching data. If the Bloom filter returns “definitely not,” the file can be skipped entirely without performing any disk I/O.

TimeseriesBloomFilter

If the Bloom filter indicates a possible match, the system proceeds to read the file and apply more precise checks using indexes or block-level metadata. Although false positives may cause occasional unnecessary reads, Bloom filters dramatically reduce the number of files that must be touched for selective queries. This makes them particularly effective in high-cardinality time-series workloads where the number of distinct series is large.

Downsampling

Most time-series databases provide built-in downsampling mechanisms that allow data to be stored at different temporal resolutions as it ages. For example, recent data may be retained at a fine-grained resolution such as seconds or milliseconds, while older data is aggregated into coarser intervals like minutes or hours. This approach reflects the common access pattern where high-resolution detail is only required for recent time ranges, while historical analysis typically focuses on broader trends.

By retaining only aggregated representations for older data, the system dramatically reduces storage requirements and the amount of data that must be scanned during queries. As a result, analytical queries over long time windows become significantly faster, since the database can read fewer blocks from disk and process far fewer data points while still preserving the overall shape and behavior of the time series.