firebolt, the database binary that executes queries. It bundles the planner, runtime, and storage engine.dedicated-pensieve, an optional standalone metadata service, for running multiple independent Firebolt clusters with decoupled metadata.
Concepts
An Engine is a cluster of one or more nodes that execute queries together. Every node runs the samefirebolt binary and is given its position with --node.
A query submitted to any node is planned there and its stages are distributed across that Engine’s nodes.
These pages assume one node per machine: one firebolt process on its own host, reached at that host’s address.
That is the only model a real deployment should use.
You can also place several nodes on a single machine, but only for testing, and it needs extra care to avoid port collisions.
That case is covered separately in Colocate multiple nodes on one host.
Table data lives in object storage (S3, GCS, or Azure Blob Storage) as immutable tablets.
Nodes read tablets directly from object storage and cache them on local SSD, so adding a node adds compute without moving data.
The directory you pass with --data-dir holds only this cache and the node’s configuration; object storage is the single source of truth.
Metadata, the catalog of tables, columns, and tablet locations, is served in one of two modes, set by instance.type in the engine configuration:
- Embedded metadata (the default): the Engine hosts its own metadata service, backed by a local SQLite database. No separate process and no Postgres. The metadata belongs to that one Engine and cannot be shared. This is the simplest deployment.
- Standalone metadata: the Engine connects to a separate
dedicated-pensieveprocess backed by Postgres. Because the metadata lives outside any single Engine, several Engines can share one catalog and one bucket, and each reads the latest snapshot written by the others. This is the basis of workload isolation: two Engines operate on the same tablets without drawing compute from each other.
Recommended topology
Run onefirebolt node per machine.
Every node reads and writes the same object storage bucket.
The metadata mode decides whether there is a separate process.
- Embedded metadata
- Standalone metadata
Choose a deployment
- Deploy with embedded metadata: one Engine, no separate metadata process. Start here.
- Deploy with standalone metadata: one or more Engines sharing a Postgres-backed metadata service and one bucket.
- Colocate multiple nodes on one host: run several nodes or Engines on one machine for testing only. Most deployments never need this.
How the nodes communicate
A node serves clients over HTTP, exchanges work with its peers over two channels, reaches the metadata service over gRPC, and reads and writes tablets directly to object storage. Each port below is per node. The engine binds them on all interfaces (0.0.0.0) by default, so with one node per machine the defaults never collide and you do not have to change them.
| Port | Default | Protocol | Direction | Purpose |
|---|---|---|---|---|
--http-port | 3473 | HTTP | Client to node | Submit SQL and read results. /ping returns Ok. |
aragog_port | 5678 | gRPC | Node to node | Distributed execution control: schedule, cancel, and discard query stages |
shufflepuff_port | 16000 | TCP | Node to node | Data exchange (shuffle) between stages of a distributed query |
storage_manager_port | 1717 | gRPC | Node to node | Cluster storage coordination (tablet assignment, statistics). Bound only by the leader |
storage_agent_port | 3434 | gRPC | Node to node | Per-node storage and cache agent |
health_check_port | 8122 | HTTP | Local | Liveness and readiness probes. Not part of query execution |
prometheus_port | 9090 | HTTP | Local | Metrics scrape endpoint. Not part of query execution |
| Port | Default | Protocol | Direction | Purpose |
|---|---|---|---|---|
| Embedded metadata | 6500 | gRPC | Node to node | In embedded mode node 0 hosts it; other nodes connect to <node 0 host>:6500 |
| Standalone metadata | 7000 | gRPC | Node to service | In standalone mode every node connects to the dedicated-pensieve process |
| Postgres | 5432 | Postgres | Service to database | dedicated-pensieve stores metadata here |
localhost.
All nodes of one Engine must start concurrently.
Each node’s readiness check runs a distributed query that needs its peers reachable, so starting one node and waiting for it before starting the next deadlocks.
Start every node of an Engine, then poll each node’s /ping until all return Ok..
Stop a node by sending SIGTERM to the process ID that firebolt server start prints (or that you captured when launching it in the background).
Get the binaries
Download the prebuilt binaries for your platform and put them on yourPATH:
firebolt, the engine.dedicated-pensieve, the standalone metadata service (only needed for standalone metadata).
firebolt-db/packdb releases):
| Platform | Engine (firebolt) | Standalone metadata (dedicated-pensieve) |
|---|---|---|
| Linux x86-64 (amd64) | firebolt-core-amd64.tar.gz | dedicated-pensieve-amd64.tar.gz |
| Linux ARM64 (aarch64) | firebolt-core-arm64.tar.gz | dedicated-pensieve-arm64.tar.gz |
PATH (the engine archive unpacks to a firebolt-core-<arch>/ directory containing firebolt).
Or build the binaries yourself.
The examples in this guide invoke them as firebolt and dedicated-pensieve.
The firebolt binary bundles the server and a CLI; every mode here starts a server with firebolt server start.
See engine arguments for --data-dir and --server-config, and engine configuration for the YAML file.
Build the binaries yourself
Build on Ubuntu 22.04 or newer.| Tool | Version | Install |
|---|---|---|
| Clang | 18 | sudo apt-get install clang-18 |
| Ninja | any | sudo apt-get install ninja-build |
| CMake | 3.20+ | sudo apt-get install cmake |
| Rust | stable | curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh |
build/programs/firebolt/firebolt and build/programs/dedicated-pensieve/dedicated-pensieve.
Add both directories to your PATH (or substitute the full path in each command):