This page provides an overview of the high-level architecture of Firebolt Core, focusing primarily on the characteristics that are relevant for successfully operating a Firebolt Core cluster. Fundamentally, a Firebolt Core cluster is comprised of one or more Firebolt Core nodes that work together to process SQL queries submitted to the cluster. The individual nodes within a cluster are mostly indistinguishable, except for some minor edge cases outlined below. Each node runs a single Firebolt Core Docker container which implements all of the required internal database services and manages a shard of the persistent database state. Queries can be submitted to any node within a cluster, and will usually be processed jointly by all nodes.Documentation Index
Fetch the complete documentation index at: https://docs.firebolt.io/llms.txt
Use this file to discover all available pages before exploring further.
State Management
Firebolt Core supports DDL and DML commands likeCREATE TABLE or INSERT which modify the persistent database state associated with a cluster. More specifically, this persistent database state consists of the following two complementary types of data that are managed by a Firebolt Core cluster.
- The database catalog containing information about schema objects like databases, namespaces, tables, views, indexes, etc.
- The contents of these schema objects such as the actual data inserted into a table.
/firebolt-core/volume/persistent_data/metadata_store directory.
The contents of schema objects (2) are sharded transparently across all nodes of a Firebolt Core cluster. When a query accesses such a schema object, the query execution engine automatically generates a distributed query plan which ensures that all of the corresponding shards are correctly read and processed. When writing to such a schema object, the query execution engine does not actively distribute data across nodes. For example, this means that a plain INSERT statement into a table will usually only insert data to the shard stored on the node to which the query was sent. Users should make sure to route such statements to different nodes in order to ensure an even data distribution (see also Connect).
Since the contents of schema objects are sharded across nodes, changing the number of nodes requires bringing up a new cluster with the updated topology rather than hot-resizing an existing one. When object storage is configured, data is not lost in this process — the new cluster rehydrates from object storage. Without object storage, the shards stored on any removed node are lost permanently. Adding a node results in data skew since there will initially be no shards stored on the new node. Note that this restriction does not apply if the workload does not involve any locally managed tables — for example, a workload that only operates on external or Iceberg tables only persists the database catalog on node 0, with no sharded data on the remaining nodes.
By default, all of the above database state would be written to the writeable container layer of the Firebolt Core Docker containers, which means that it is tied to a specific Docker container that cannot easily be migrated between hosts. In the supported Docker Compose and Kubernetes deployments volumes are mounted to ease preservation of the state; see Deployment and Operational Guide for further details.