Understanding Compression Codecs

This guide explains all the compression codecs available in Firebolt, helping you understand what each one does, when to use them, and how they perform with different types of data.

For practical examples, getting started guidance, and compression best practices, see the Table and Column Compression overview.

General-purpose codecs

LZ4

Syntax: lz4 or lz4() Description: Fast lossless compression algorithm optimized for speed over compression ratio. Default compression codec in Firebolt. Parameters: None Performance characteristics:

Compression speed: Very fast
Decompression speed: Very fast
Typical compression ratio: Good
CPU usage: Very low

Best for:

High-throughput data ingestion
Frequently accessed data requiring fast queries
Real-time analytics workloads
When CPU resources are limited

Data type compatibility: All types

LZ4HC (LZ4 High Compression)

Syntax: lz4hc(level) where level is 1-12 Description: High-compression variant of LZ4 that trades slower compression for better compression ratios while maintaining fast decompression. Parameters:

level: Compression level (1-12, default: 9)
- Levels 1-6: Faster compression, lower ratio
- Levels 7-9: Balanced performance (recommended)
- Levels 10-12: Maximum compression, slower writes

Performance characteristics:

Compression speed: Moderate
Decompression speed: Very fast
Typical compression ratio: Better than LZ4
CPU usage: Low to moderate

Best for:

Write-heavy workloads where read performance is critical
Balanced storage and performance requirements
Data with moderate access patterns

Data type compatibility: All types Examples:

-- Balanced performance
column_name TYPE COMPRESSION lz4hc(6)

-- Maximum LZ4HC compression  
column_name TYPE COMPRESSION lz4hc(12)

ZSTD (Zstandard)

Syntax: zstd(level) where level is 1-22 Description: Modern compression algorithm providing excellent balance between compression ratio and speed with highly configurable compression levels. Parameters:

level: Compression level (1-22, default: 1)
- Levels 1-5: Fast compression, good for online workloads
- Levels 6-10: Balanced performance
- Levels 11-15: High compression for archival data
- Levels 16-22: Maximum compression, very slow

Performance characteristics by level:

Level	Compression Speed	Compression Ratio	Use Case
1	Fast	Good	Real-time data
3	Fast	Better	General workloads
7	Moderate	Better	Balanced storage/speed
15	Slow	Excellent	Archival data
22	Very slow	Excellent	Maximum compression

Best for:

General analytics workloads (levels 1-5)
Archival data storage (levels 10+)
Data with good compressibility (JSON, text, logs)

Data type compatibility: All types Examples:

-- Fast ZSTD for hot data
log_data TEXT COMPRESSION zstd(3)

-- High compression for cold data
archived_events TEXT COMPRESSION zstd(15)

NONE

Syntax: none Description: Disables compression entirely. Data is stored uncompressed. Parameters: None Performance characteristics:

Compression speed: N/A (no compression)
Decompression speed: N/A (no decompression)
Compression ratio: None (no compression)
CPU usage: None

Best for:

Already compressed data (images, videos, compressed files)
Data that compresses poorly
Debugging compression issues
Maximum query performance at cost of storage

Data type compatibility: All types Limitations: Cannot be chained with other codecs

DEFAULT

Syntax: default Description: Uses system default compression, which is equivalent to lz4. Parameters: None Best for:

Inheriting table-level compression settings
Removing column-specific compression overrides

Specialized (preprocessing) codecs

Delta

Syntax: delta Description: Stores differences between consecutive values instead of absolute values. Highly effective for sequential or slowly-changing numeric data. Parameters: None Performance characteristics:

Compression speed: Fast
Decompression speed: Fast
Compression ratio: Highly variable, excellent for sequential data
CPU usage: Very low

Data type compatibility: Integers, dates, timestamps (NOT NULL only) Best for:

Sequential IDs (user_id, order_id, event_id)
Monotonic counters
Timestamps in time-series data
Any slowly changing numeric values

Must be chained with a general-purpose codec Examples:

-- Sequential user IDs
user_id INTEGER NOT NULL COMPRESSION (delta, lz4)

-- Timestamp optimization
created_at TIMESTAMPTZ NOT NULL COMPRESSION (delta, zstd(3))

DoubleDelta

Syntax: doubledelta Description: Stores differences of differences, optimized for monotonic sequences with relatively constant intervals (like regular time series). Parameters: None Performance characteristics:

Compression speed: Fast
Decompression speed: Fast
Compression ratio: Excellent for regular time series
CPU usage: Low

Data type compatibility: Integers, dates, timestamps (NOT NULL only) Best for:

Regular time-series timestamps (every minute, hour, day)
Monotonic sequences with consistent intervals
Event timestamps in ordered data

Must be chained with a general-purpose codec Examples:

-- Regular time series timestamps
timestamp TIMESTAMPTZ NOT NULL COMPRESSION (doubledelta, lz4)

-- Sequential event IDs with consistent gaps
event_sequence INTEGER NOT NULL COMPRESSION (doubledelta, lz4hc(6))

Gorilla

Syntax: gorilla Description: Optimized for floating-point values that change slowly over time. Uses XOR-based encoding to efficiently store small changes between consecutive values. Parameters: None Performance characteristics:

Compression speed: Moderate
Decompression speed: Fast
Compression ratio: Excellent for time-series floats
CPU usage: Low to moderate

Data type compatibility: FLOAT, DOUBLE, dates, timestamps Best for:

Sensor readings that change gradually
Financial time series (prices, rates)
IoT device metrics
Any slowly-changing floating-point data

Must be chained with a general-purpose codec Examples:

-- Temperature sensor readings
temperature FLOAT NOT NULL COMPRESSION (gorilla, lz4)

-- Stock prices  
price DOUBLE NOT NULL COMPRESSION (gorilla, zstd(3))

-- Time series with timestamps
timestamp TIMESTAMPTZ NOT NULL COMPRESSION (gorilla, lz4hc(6))

Chaining strategies

Two-stage chains (Recommended)

Most effective combinations for common use cases:

-- Sequential data with fast access
id INTEGER NOT NULL COMPRESSION (delta, lz4)

-- Time series with balanced compression
sensor_value FLOAT NOT NULL COMPRESSION (gorilla, zstd(3))

-- Regular timestamps  
recorded_at TIMESTAMPTZ NOT NULL COMPRESSION (doubledelta, lz4hc(6))

Three-stage chains (Advanced)

For maximum compression on specific data patterns:

-- Complex time series optimization
timestamp TIMESTAMPTZ NOT NULL COMPRESSION (doubledelta, gorilla, zstd(7))

-- Highly sequential float data
sequence_value DOUBLE NOT NULL COMPRESSION (delta, gorilla, lz4hc(9))

Note: Three-stage chains increase CPU overhead. Test performance before production use.

Backward compatibility

Legacy syntax support

For backward compatibility, the older compression syntax remains supported for lz4 and zstd codecs only:

-- Legacy syntax (supported for lz4 and zstd only)  
CREATE TABLE old_style (
    id INTEGER COMPRESSION zstd COMPRESSION_LEVEL 5,
    name TEXT COMPRESSION lz4
);

-- New syntax (recommended for all features)
CREATE TABLE new_style (
    id INTEGER COMPRESSION (zstd(5)),
    name TEXT COMPRESSION (lz4)
);

Legacy syntax limitations:

Only lz4 and zstd codecs supported
No specialized codecs (delta, gorilla, doubledelta)
No compression chaining capabilities
No access to advanced parameters

New syntax benefits:

Support for all compression codecs
Compression chaining capabilities
Access to specialized codecs
Consistent parameter syntax
Future feature compatibility

What is Firebolt?

Overview

Performance and Observability

Guides

SQL reference

General reference

API reference

General-purpose codecs

LZ4

LZ4HC (LZ4 High Compression)

ZSTD (Zstandard)

NONE

DEFAULT

Specialized (preprocessing) codecs

Delta

DoubleDelta

Gorilla

Chaining strategies

Two-stage chains (Recommended)

Three-stage chains (Advanced)

Backward compatibility

Legacy syntax support

See also

What is Firebolt?

Overview

Performance and Observability

Guides

SQL reference

General reference

API reference

​General-purpose codecs

​LZ4

​LZ4HC (LZ4 High Compression)

​ZSTD (Zstandard)

​NONE

​DEFAULT

​Specialized (preprocessing) codecs

​Delta

​DoubleDelta

​Gorilla

​Chaining strategies

​Two-stage chains (Recommended)

​Three-stage chains (Advanced)

​Backward compatibility

​Legacy syntax support

​See also

General-purpose codecs

LZ4

LZ4HC (LZ4 High Compression)

ZSTD (Zstandard)

NONE

DEFAULT

Specialized (preprocessing) codecs

Delta

DoubleDelta

Gorilla

Chaining strategies

Two-stage chains (Recommended)

Three-stage chains (Advanced)

Backward compatibility

Legacy syntax support

See also