Deploy with embedded metadata - Firebolt Documentation

With embedded metadata (the default, single_engine mode), the firebolt Engine hosts its own metadata service, backed by a local SQLite database. There is no separate process and no Postgres. This is the simplest way to run an Engine. The trade-off is that the metadata belongs to that one Engine and cannot be shared: to run more than one Engine against the same data, use standalone metadata instead. These instructions assume you have the firebolt binary on your PATH, as described in Get the binaries. Examples use Amazon S3; for other object stores, change the storage block as shown in engine configuration.

Just want a local database? You do not need any of the configuration below. Start a single node with a data directory and no storage block, and Firebolt keeps table data on local disk:

firebolt server start --data-dir "$PWD/local-db" --detach
curl -s localhost:3473/ --data-binary "CREATE TABLE t (a INT);"
curl -s localhost:3473/ --data-binary "INSERT INTO t VALUES (1),(2),(3);"
curl -s localhost:3473/ --data-binary "SELECT sum(a) FROM t;"

The object storage, multi-node, and shared-metadata setups below matter only when you want durable shared storage or more compute.

Provide object storage credentials

The engine reads object storage credentials from the standard AWS environment variables. Export them in the shell that starts each node, and create a bucket for managed tables:

export AWS_ACCESS_KEY_ID="..."
export AWS_SECRET_ACCESS_KEY="..."
export AWS_SESSION_TOKEN="..."   # only for temporary (STS) credentials

aws s3 mb s3://my-firebolt-bucket --region us-east-1   # region must match the config below

Single-node engine

One process, embedded metadata, managed tables in object storage. Omitting the instance block selects the default embedded mode. Write the configuration into the data directory:

mkdir -p node
cat > node/config.yaml <<'EOF'
schema_version: "1.0"
engine:
  nodes:
    - host: 127.0.0.1
storage:
  type: s3
  bucket_name: my-firebolt-bucket
  aws:
    region: us-east-1
EOF

Start the node. --data-dir must be absolute because the local query socket is derived from it. --detach blocks until the node is ready and prints JSON containing the process ID:

firebolt server start --data-dir "$PWD/node" --http-port 3473 --detach

Verify and create data:

curl -s localhost:3473/ping
curl -s localhost:3473/ --data-binary "CREATE TABLE t (a INT);"
curl -s localhost:3473/ --data-binary "INSERT INTO t VALUES (1),(2),(3);"
curl -s localhost:3473/ --data-binary "SELECT count(*), sum(a) FROM t;"
aws s3 ls s3://my-firebolt-bucket --recursive # print tablet objects

Ports to open: only --http-port (3473), and only for the clients that submit queries. Clean up: Stop the node by its process ID.

kill -TERM <node-pid>
rm -rf node
aws s3 rm s3://my-firebolt-bucket --recursive
aws s3 rb s3://my-firebolt-bucket

Multi-node engine

One Engine, two nodes, one node per machine, still embedded metadata. Node 0 hosts the embedded metadata service on port 6500; node 1 connects to it automatically at <node 0 host>:6500. Both nodes share one engine.nodes list, and --node selects which entry each process is. Set each node’s host to the address its peer reaches it on and keep the default ports. With one node per machine the defaults never collide, so there is nothing else to configure:

mkdir -p node0 node1
cat > node0/config.yaml <<'EOF'
schema_version: "1.0"
engine:
  nodes:
    - host: 10.0.0.10          # node 0 hosts embedded metadata on :6500
    - host: 10.0.0.11          # node 1
storage:
  type: s3
  bucket_name: my-firebolt-bucket
  aws:
    region: us-east-1
EOF
cp node0/config.yaml node1/config.yaml

Copy each config to its own machine and start both nodes concurrently, then wait for both to report ready. Do not use --detach here: each node’s readiness check needs its peer, so starting them one at a time deadlocks.

# on node 0's machine (10.0.0.10)
firebolt server start --data-dir "$PWD/node0" --node 0 --http-port 3473 &

# on node 1's machine (10.0.0.11)
firebolt server start --data-dir "$PWD/node1" --node 1 --http-port 3473 &

# from anywhere that can reach both
until curl -sf 10.0.0.10:3473/ping >/dev/null && curl -sf 10.0.0.11:3473/ping >/dev/null; do sleep 2; done
echo "both ready"

Submit queries to either node; the receiving node distributes stages across both:

curl -s 10.0.0.10:3473/ --data-binary "CREATE TABLE t (a INT);"
curl -s 10.0.0.10:3473/ --data-binary "INSERT INTO t SELECT * FROM generate_series(1, 1000000);"
curl -s 10.0.0.10:3473/ --data-binary "SELECT count(*), sum(a) FROM t;"

Ports to open: between the two Engine nodes, aragog_port (5678), shufflepuff_port (16000), storage_agent_port (3434), and storage_manager_port (1717); the embedded metadata port (6500) from node 1 to node 0; and --http-port (3473) from your clients. Clean up: stop each node by its process ID, remove the data directories, then empty and remove the bucket.

To run both nodes on one machine for local testing, see Colocate multiple nodes on one host. That needs distinct ports per node and is not a production layout.

Embedded metadata is internal to one Engine. It is reachable only by that Engine’s own nodes, and there is no configuration that points a second Engine at another Engine’s embedded metadata. A second firebolt process started in embedded mode hosts its own, independent metadata and sees its own catalog, even if you point it at the same bucket. The two Engines would then write conflicting metadata for the same storage. To have more than one Engine read and write the same tables, the metadata must live in a service outside any single Engine. That is exactly what standalone metadata provides.

​Provide object storage credentials

​Single-node engine

​Multi-node engine

​You cannot share embedded metadata between Engines

Provide object storage credentials

Single-node engine

Multi-node engine

You cannot share embedded metadata between Engines