Core concepts

Firebolt speaks a PostgreSQL-compatible SQL dialect over three kinds of data: Iceberg tables for batch analytics on a data lake, managed tables for real-time workloads, and files in object storage queried directly. This page is a fast, technical tour of all three.

Object model

Objects are organized as database.schema.table. Databases and schemas are namespaces; tables hold data. A query can join across databases by fully qualifying names, with no federation step.

CREATE DATABASE analytics;
CREATE SCHEMA analytics.app;

CREATE TABLE analytics.app.events (
    event_id   BIGINT,
    user_id    BIGINT,
    event_type TEXT,
    ts         TIMESTAMPTZ
) PRIMARY INDEX user_id, ts;

-- Set a default database for unqualified names, or qualify across databases.
USE DATABASE analytics;

SELECT e.event_type, u.plan
FROM app.events AS e
JOIN billing.public.users AS u ON e.user_id = u.user_id;

See CREATE TABLE and cross-database queries.

Iceberg: batch

Query Apache Iceberg tables in a data lake without copying them. Read ad hoc with the READ_ICEBERG table-valued function, or register a table once against a catalog and query it by name:

-- Ad hoc, against a table path or REST catalog.
SELECT * FROM READ_ICEBERG(URL => 's3://lake/warehouse/db/events');

-- Registered against a catalog location; the schema is inferred from the Iceberg metadata.
CREATE ICEBERG TABLE events LOCATION = 'my_iceberg_catalog';
SELECT count(*) FROM events;

Reads support file-based and REST catalogs (AWS Glue, Unity Catalog, Snowflake Open Catalog). DML on external Iceberg tables is not supported, but Firebolt can write a new Iceberg table with CREATE ICEBERG TABLE ... AS SELECT, so results stay open to other engines. For how to query, tune, and write Iceberg tables, see the Iceberg guide. For an alternative open table format that keeps its metadata in PostgreSQL, see DuckLake.

Managed tables: real time

Managed tables are Firebolt’s native storage, built for low-latency serving and high-throughput streaming ingest. Tablets live in object storage and are cached on local SSD, and a PRIMARY INDEX sets the sort order and a sparse index used to prune granules at scan time.

Indexes

Indexes are maintained automatically as data changes. The full set:

Index	Definition	Use
Primary	`PRIMARY INDEX col[, ...]` in `CREATE TABLE`	Sort order plus a sparse index; prunes granules by range.
Aggregating	`CREATE AGGREGATING INDEX i ON t (keys, agg(...))`	Precomputed `GROUP BY` with partial aggregate state; matched transparently.
Data skipping	`CREATE INDEX i ON t USING SKIP_INDEX(expr) WITH (TYPE = minmax)`	Per-granule min/max for non-primary columns.
Inverted	`CREATE INDEX i ON t USING INVERTED_INDEX(col)`	Roaring-bitmap posting lists for exact token lookups.
Full text	`CREATE INDEX i ON t USING FULL_TEXT(col)`	N-gram index for substring and text search.
Vector	`CREATE INDEX i ON t USING HNSW (col distance) WITH (dimension = d)`	Approximate nearest-neighbor search over embeddings.

Pruning, row filtering, and deletion all reduce to Roaring-bitmap set operations. See Storage and indexing for the scan pipeline and tuning.

Real-time ingest

Stream from Kafka directly into a managed table with the READ_STREAM table-valued function. Offsets advance inside the ingesting transaction, so ingestion is exactly-once:

CREATE LOCATION kafka_src WITH (SOURCE = KAFKA, BROKERS = 'broker:9092');
CREATE STREAM clicks (user_id BIGINT, url TEXT) TOPIC = 'clicks'
  LOCATION = 'kafka_src' TYPE = 'JSON';

INSERT INTO analytics.app.events
SELECT user_id, url, 'click', now()
FROM READ_STREAM(STREAM clicks);

INSERT is also available for batch and singleton writes. See READ_STREAM.

Change data capture

Firebolt also ingests change feeds natively from operational databases: it connects to Postgres logical replication or a MongoDB change stream directly and maintains a continuously updated mirror table, one live row per key, with no connector or queue in between:

CREATE CDC TABLE orders (
  order_id BIGINT, customer_id BIGINT, amount NUMERIC(12, 2),
  PRIMARY KEY (order_id) NOT ENFORCED
) FROM STREAM orders_changes;

ALTER CDC TABLE orders RESUME;

Ingestion is exactly-once, and reads are always current and snapshot-consistent. See Change data capture.

Writes and maintenance

DELETE and UPDATE are merge-on-read: a delete records row positions in a per-transaction deletion mask (a Roaring bitmap) rather than rewriting data, and an update is a delete plus an insert. This keeps writes cheap and gives snapshot isolation, since a query applies only the masks committed as of its transaction. Background maintenance keeps storage tight. Compaction merges small tablets toward the target tablet size, and removes rows that deletion masks have retired. It runs automatically per engine; run it on demand with VACUUM:

VACUUM analytics.app.events;

Because every engine reads the same tablets from object storage, you can run ingest, compaction, and serving on separate engines without contention. See VACUUM.

External data

Query files in object storage directly with the READ_* table-valued functions, with no schema definition or load step:

SELECT * FROM READ_PARQUET(URL => 's3://lake/events/*.parquet');

READ_CSV, READ_JSON, READ_AVRO, and READ_TEXT work the same way, and a named LOCATION keeps credentials out of the query. When you want data resident for the fastest queries, load it into a managed table with COPY FROM:

COPY INTO analytics.app.events
FROM 's3://lake/events/'
WITH (TYPE = PARQUET);

Prefer table-valued functions and COPY over CREATE EXTERNAL TABLE: they need no persistent schema and cover both exploration and bulk loading. See the table-valued functions reference and COPY FROM.

Access control

Firebolt secures objects with role-based access control: privileges are granted to roles, and roles to users, across the database, schema, table, engine, and account scopes. Grant the privileges a workload needs and nothing more.

CREATE ROLE analyst;
GRANT USAGE ON DATABASE analytics TO analyst;
GRANT SELECT ON TABLE analytics.app.events TO analyst;
GRANT ROLE analyst TO USER "alice@example.com";

See Security for the full permission model, secure views, and column-level security.

Overview

Performance and Observability

Security

Self-Managed

Managed service

Guides

SQL reference

Release notes

API reference

Legal

Object model

Iceberg: batch

Managed tables: real time

Indexes

Real-time ingest

Change data capture

Writes and maintenance

External data

Access control

​Object model

​Iceberg: batch

​Managed tables: real time

​Indexes

​Real-time ingest

​Change data capture

​Writes and maintenance

​External data

​Access control

Object model

Iceberg: batch

Managed tables: real time

Indexes

Real-time ingest

Change data capture

Writes and maintenance

External data

Access control