Skip to main content
Firebolt speaks a PostgreSQL-compatible SQL dialect over three kinds of data: Iceberg tables for batch analytics on a data lake, managed tables for real-time workloads, and files in object storage queried directly. This page is a fast, technical tour of all three.

Object model

Objects are organized as database.schema.table. Databases and schemas are namespaces; tables hold data. A query can join across databases by fully qualifying names, with no federation step.
CREATE DATABASE analytics;
CREATE SCHEMA analytics.app;

CREATE TABLE analytics.app.events (
    event_id   BIGINT,
    user_id    BIGINT,
    event_type TEXT,
    ts         TIMESTAMPTZ
) PRIMARY INDEX user_id, ts;

-- Set a default database for unqualified names, or qualify across databases.
USE DATABASE analytics;

SELECT e.event_type, u.plan
FROM app.events AS e
JOIN billing.public.users AS u ON e.user_id = u.user_id;
See CREATE TABLE and cross-database queries.

Iceberg: batch

Query Apache Iceberg tables in a data lake without copying them. Read ad hoc with the READ_ICEBERG table-valued function, or register a table once against a catalog and query it by name:
-- Ad hoc, against a table path or REST catalog.
SELECT * FROM READ_ICEBERG(URL => 's3://lake/warehouse/db/events');

-- Registered against a catalog location.
CREATE ICEBERG TABLE events (event_id BIGINT, ts TIMESTAMPTZ)
  LOCATION = 'my_iceberg_catalog';
SELECT count(*) FROM events;
Reads support file-based and REST catalogs (AWS Glue, Unity Catalog, Snowflake Open Catalog). DML on external Iceberg tables is not supported, but Firebolt can write a new Iceberg table with CREATE ICEBERG TABLE ... AS SELECT, so results stay open to other engines. See READ_ICEBERG and CREATE ICEBERG TABLE. For an Iceberg-compatible format with PostgreSQL-hosted metadata, see DuckLake.

Managed tables: real time

Managed tables are Firebolt’s native storage, built for low-latency serving and high-throughput streaming ingest. Tablets live in object storage and are cached on local SSD, and a PRIMARY INDEX sets the sort order and a sparse index used to prune granules at scan time.

Indexes

Indexes are maintained automatically as data changes. The full set:
IndexDefinitionUse
PrimaryPRIMARY INDEX col[, ...] in CREATE TABLESort order plus a sparse index; prunes granules by range.
AggregatingCREATE AGGREGATING INDEX i ON t (keys, agg(...))Precomputed GROUP BY with partial aggregate state; matched transparently.
Data skippingCREATE INDEX i ON t USING SKIP_INDEX(expr) WITH (TYPE = minmax)Per-granule min/max for non-primary columns.
InvertedCREATE INDEX i ON t USING INVERTED_INDEX(col)Roaring-bitmap posting lists for exact token lookups.
Full textCREATE INDEX i ON t USING FULL_TEXT(col)N-gram index for substring and text search.
VectorCREATE INDEX i ON t USING HNSW (col distance) WITH (dimension = d)Approximate nearest-neighbor search over embeddings.
Pruning, row filtering, and deletion all reduce to Roaring-bitmap set operations. See Storage and indexing for the scan pipeline and tuning.

Real-time ingest

Stream from Kafka directly into a managed table with the READ_STREAM table-valued function. Offsets advance inside the ingesting transaction, so ingestion is exactly-once:
CREATE LOCATION kafka_src WITH (SOURCE = KAFKA, BROKERS = 'broker:9092');
CREATE STREAM clicks (user_id BIGINT, url TEXT) TOPIC = 'clicks'
  LOCATION = 'kafka_src' TYPE = 'JSON';

INSERT INTO analytics.app.events
SELECT user_id, url, 'click', now()
FROM READ_STREAM(STREAM clicks);
INSERT is also available for batch and singleton writes. See READ_STREAM.

Writes and maintenance

DELETE and UPDATE are merge-on-read: a delete records row positions in a per-transaction deletion mask (a Roaring bitmap) rather than rewriting data, and an update is a delete plus an insert. This keeps writes cheap and gives snapshot isolation, since a query applies only the masks committed as of its transaction. Background maintenance keeps storage tight. Compaction merges small tablets toward the target tablet size, and removes rows that deletion masks have retired. It runs automatically per engine; run it on demand with VACUUM:
VACUUM analytics.app.events;
Because every engine reads the same tablets from object storage, you can run ingest, compaction, and serving on separate engines without contention. See VACUUM.

External data

Query files in object storage directly with the READ_* table-valued functions, with no schema definition or load step:
SELECT * FROM READ_PARQUET(URL => 's3://lake/events/*.parquet');
READ_CSV, READ_JSON, READ_AVRO, and READ_TEXT work the same way, and a named LOCATION keeps credentials out of the query. When you want data resident for the fastest queries, load it into a managed table with COPY FROM:
COPY INTO analytics.app.events
FROM 's3://lake/events/'
WITH (TYPE = PARQUET);
Prefer table-valued functions and COPY over CREATE EXTERNAL TABLE: they need no persistent schema and cover both exploration and bulk loading. See the table-valued functions reference and COPY FROM.

Access control

Firebolt secures objects with role-based access control: privileges are granted to roles, and roles to users, across the database, schema, table, engine, and account scopes. Grant the privileges a workload needs and nothing more.
CREATE ROLE analyst;
GRANT USAGE ON DATABASE analytics TO analyst;
GRANT SELECT ON TABLE analytics.app.events TO analyst;
GRANT ROLE analyst TO USER "alice@example.com";
See Security for the full permission model, secure views, and column-level security.