Custom column and table compression
Overview
Firebolt allows users to specify a custom compression algorithm at both the table level and the column level when creating a table. Compression helps you reduce storage costs and improve query performance by decreasing disk usage and I/O.
Supported compression algorithms
Firebolt supports the following compression algorithms:
lz4
– Default option; fast compression and decompression, ideal for general workloads.zstd
– Higher compression ratios at the cost of increased CPU usage; useful for large datasets.
Disabling compression with NONE
is not supported. Firebolt always compresses data using one of the supported algorithms.
Specify compression settings
You can specify compression settings for a table, columns, or a combination of both when creating tables:
- Table-level compression sets a default compression method for all columns.
- Column-level compression overrides the table-level default for individual columns.
Table-level compression
You specify table-level compression using the WITH
clause in the CREATE TABLE
statement. This compression applies to all columns unless explicitly overriden at the column level.
Syntax:
Use the following syntax to set table-level and/or column-level compression.
CREATE TABLE table_name (
column1 data_type COMPRESSION algorithm COMPRESSION_LEVEL level,
column2 data_type COMPRESSION algorithm,
column3 data_type
)
WITH (COMPRESSION = algorithm, COMPRESSION_LEVEL =level);
Compression level
- The optional
COMPRESSION_LEVEL
parameter lets you fine-tune the balance between compression efficiency and CPU usage. - Higher values provide greater comparison but increase CPU overhead.
- Firebolt uses algorithm-specific default levels if no level is specified.
Default behavior
- If you don’t specify column-level compression. Firebolt applies the table-level setting.
- If no table-level setting is provided, Firebolt defaults to
lz4
.
Example:
The following code creates a table with ZTSD compression:
CREATE TABLE sales (
a INT
) WITH (compression=zstd, compression_level=15);
Column-level compression
Column-level compression allows you to override table-level compression settings for specific columns. This approach is useful for columns that have different compression requirements, such as large text fields or numerical columns.
Syntax:
CREATE TABLE table_name (
column1 data_type COMPRESSION algorithm COMPRESSION_LEVEL level,
column2 data_type COMPRESSION algorithm,
column3 data_type
);
Compression level
- The
compression_level
parameter fine-tunes compression efficiency versus CPU usage. - Higher compresson levels reduce storage but consume more CPU resources during compression and decompression.
- If not specified, Firebolt applies a default compression level appropriate for the selected algorithm.
Default behavior
- If you do not specify compression explicitly, Firebolt applies the default
lz4
compression algorithm. - When you define table-level compression, all columns inherit this setting unless overriden with column-level compression.
Step-by-step examples of compression
The following examples show how to use table-level and column-level compression in Firebolt.
Step 1: Create and populate an uncompressed table
Create a base table without custom compression and populate it with data using GENERATE_SERIES
-- create the target table
CREATE TABLE origin_no_compression (a int, b int, c string, d string);
-- insert 10000 rows
INSERT INTO origin_no_compression SELECT 1, 2, 'Lorem ipsum dolor sit amet, consectetur adipiscing elit.', 'In dapibus metus quis velit mattis dignissim.' FROM GENERATE_SERIES(1,10000) ;
Step 2: Apply table-level compression
Create a new table using ZTSD compression and compression level 15, applying the setting to all columns:
CREATE TABLE table_a_column_level_zstd(a int COMPRESSION zstd COMPRESSION_LEVEL 15, b int, c string, d string);
CREATE TABLE table_a_b_column_level_zstd(a int COMPRESSION zstd COMPRESSION_LEVEL 15, b int COMPRESSION zstd COMPRESSION_LEVEL 15, c string, d string);
CREATE TABLE table_c_column_level_zstd(a int , b int, c string COMPRESSION zstd COMPRESSION_LEVEL 15, d string);
CREATE TABLE table_c_d_column_level_zstd(a int , b int, c string COMPRESSION zstd COMPRESSION_LEVEL 15, d string COMPRESSION zstd COMPRESSION_LEVEL 15);
Insert data into these tables from the origin_no_compression
table:
INSERT INTO table_a_column_level_zstd SELECT * FROM origin_no_compression;
INSERT INTO table_a_b_column_level_zstd SELECT * FROM origin_no_compression;
INSERT INTO table_c_column_level_zstd SELECT * FROM origin_no_compression;
INSERT INTO table_c_d_column_level_zstd SELECT * FROM origin_no_compression;
Step 3: Apply column-level compression
Next, create tables demonstrating column-level compression configuration:
CREATE TABLE table_a_b_column_level_zstd (
a INT COMPRESSION zstd COMPRESSION_LEVEL 15,
b INT COMPRESSION zstd COMPRESSION_LEVEL 15,
c STRING,
d STRING
);
Step 4: Compare compression results
Check the compressed_byte
value of the tables to evaluate the effectiveness of compression settings:
SELECT compressed_bytes,table_name
FROM information_schema.tables
WHERE table_name = 'origin_no_compression'
OR table_name = 'table_a_column_level_zstd'
OR table_name = 'table_a_b_column_level_zstd'
OR table_name = 'table_c_column_level_zstd'
OR table_name = 'table_c_d_column_level_zstd'
ORDER BY table_name;
Example results
These are typical results from a single-node engine, showing how different compression methods and column-level settings affect table sizes:
compressed_bytes | table_name |
---|---|
4772 | origin_no_compression |
4396 | table_a_b_column_level_zstd |
4584 | table_a_column_level_zstd |
2558 | table_c_column_level_zstd |
763 | table_c_d_column_level_zstd |
Column-level expression significantly reduces storage for specific columns, as demonstrated in these examples, and ZSTD typically offers higher compression efficiency compared to the default LZ4.