Object model
Objects are organized asdatabase.schema.table. Databases and schemas are namespaces; tables hold data. A query can join across databases by fully qualifying names, with no federation step.
Iceberg: batch
Query Apache Iceberg tables in a data lake without copying them. Read ad hoc with theREAD_ICEBERG table-valued function, or register a table once against a catalog and query it by name:
CREATE ICEBERG TABLE ... AS SELECT, so results stay open to other engines. See READ_ICEBERG and CREATE ICEBERG TABLE. For an Iceberg-compatible format with PostgreSQL-hosted metadata, see DuckLake.
Managed tables: real time
Managed tables are Firebolt’s native storage, built for low-latency serving and high-throughput streaming ingest. Tablets live in object storage and are cached on local SSD, and aPRIMARY INDEX sets the sort order and a sparse index used to prune granules at scan time.
Indexes
Indexes are maintained automatically as data changes. The full set:| Index | Definition | Use |
|---|---|---|
| Primary | PRIMARY INDEX col[, ...] in CREATE TABLE | Sort order plus a sparse index; prunes granules by range. |
| Aggregating | CREATE AGGREGATING INDEX i ON t (keys, agg(...)) | Precomputed GROUP BY with partial aggregate state; matched transparently. |
| Data skipping | CREATE INDEX i ON t USING SKIP_INDEX(expr) WITH (TYPE = minmax) | Per-granule min/max for non-primary columns. |
| Inverted | CREATE INDEX i ON t USING INVERTED_INDEX(col) | Roaring-bitmap posting lists for exact token lookups. |
| Full text | CREATE INDEX i ON t USING FULL_TEXT(col) | N-gram index for substring and text search. |
| Vector | CREATE INDEX i ON t USING HNSW (col distance) WITH (dimension = d) | Approximate nearest-neighbor search over embeddings. |
Real-time ingest
Stream from Kafka directly into a managed table with theREAD_STREAM table-valued function. Offsets advance inside the ingesting transaction, so ingestion is exactly-once:
INSERT is also available for batch and singleton writes. See READ_STREAM.
Writes and maintenance
DELETE and UPDATE are merge-on-read: a delete records row positions in a per-transaction deletion mask (a Roaring bitmap) rather than rewriting data, and an update is a delete plus an insert. This keeps writes cheap and gives snapshot isolation, since a query applies only the masks committed as of its transaction.
Background maintenance keeps storage tight. Compaction merges small tablets toward the target tablet size, and removes rows that deletion masks have retired. It runs automatically per engine; run it on demand with VACUUM:
External data
Query files in object storage directly with theREAD_* table-valued functions, with no schema definition or load step:
READ_CSV, READ_JSON, READ_AVRO, and READ_TEXT work the same way, and a named LOCATION keeps credentials out of the query. When you want data resident for the fastest queries, load it into a managed table with COPY FROM:
COPY over CREATE EXTERNAL TABLE: they need no persistent schema and cover both exploration and bulk loading. See the table-valued functions reference and COPY FROM.