Deploy with standalone metadata - Firebolt Documentation

With standalone metadata (multi_engine mode), the Engine does not host its own metadata. Instead every node connects to a separate metadata service, the dedicated-pensieve process, which stores metadata in Postgres. Because the metadata lives outside any single Engine, several Engines can point at the same metadata service and the same bucket, and each reads the latest snapshot written by the others. This is the basis of workload isolation: two Engines operate on the same object-storage tablets without drawing compute from each other. These instructions assume you have firebolt and dedicated-pensieve on your PATH, as described in Get the binaries. Examples use Amazon S3; for other object stores, change the storage block as shown in engine configuration. The start order is Postgres, then the metadata service, then the Engine nodes. Teardown reverses it.

Set up Postgres and the metadata service

Install and run Postgres (for example sudo apt-get install -y postgresql), then create a role and database for the metadata service. The metadata service reaches Postgres over the network, so choose a strong, unique password for this role rather than a default; the commands read it from an environment variable so it never has to be written into the docs or the config file. Run as a Postgres superuser:

export PENSIEVE_DB_PASSWORD='choose-a-strong-password'   # set this to your own secret

psql -d postgres <<SQL
CREATE ROLE pensieve WITH LOGIN PASSWORD '${PENSIEVE_DB_PASSWORD}';
CREATE DATABASE dedicated_pensieve OWNER pensieve;
SQL
psql -d dedicated_pensieve -c 'GRANT ALL ON SCHEMA public TO pensieve;'   # Postgres 15+ locks down public

Write the metadata service configuration and start it. It listens on port 7000 and persists to the Postgres database above:

cat > "$PWD/pensieve.xml" <<EOF
<?xml version="1.0"?>
<config>
  <pensieve_lite>
    <host>0.0.0.0</host>
    <port>7000</port>
    <metadata_storage>
      <postgresql>
        <host>localhost</host>
        <port>5432</port>
        <database>dedicated_pensieve</database>
        <schema>public</schema>
        <username>pensieve</username>
        <password>${PENSIEVE_DB_PASSWORD}</password>
      </postgresql>
    </metadata_storage>
  </pensieve_lite>
</config>
EOF

dedicated-pensieve --config "$PWD/pensieve.xml" &   # logs "Server listening on 0.0.0.0:7000"
PENSIEVE_PID=$!

Provide object storage credentials in the shell that starts each Engine node, and create a bucket:

export AWS_ACCESS_KEY_ID="..."
export AWS_SECRET_ACCESS_KEY="..."
export AWS_SESSION_TOKEN="..."   # only for temporary (STS) credentials

aws s3 mb s3://my-firebolt-bucket --region us-east-1

One multi-node engine

A single Engine with two nodes, one node per machine, using the standalone metadata service. Set instance.type to multi_engine and point metadata_endpoint at the metadata service. Set each node’s host to the address its peer reaches it on and keep the default ports:

mkdir -p node0 node1
cat > node0/config.yaml <<'EOF'
schema_version: "1.0"
instance:
  type: multi_engine
  multi_engine:
    metadata_endpoint: 10.0.0.5:7000   # the metadata service host
engine:
  nodes:
    - host: 10.0.0.10                  # node 0
    - host: 10.0.0.11                  # node 1
storage:
  type: s3
  bucket_name: my-firebolt-bucket
  aws:
    region: us-east-1
EOF
cp node0/config.yaml node1/config.yaml

Copy each config to its own machine and start both nodes concurrently, then wait for both. Do not use --detach: each node’s readiness check needs its peer, so starting them one at a time deadlocks.

# on node 0's machine (10.0.0.10)
firebolt server start --data-dir "$PWD/node0" --node 0 --http-port 3473 &

# on node 1's machine (10.0.0.11)
firebolt server start --data-dir "$PWD/node1" --node 1 --http-port 3473 &

# from anywhere that can reach both
until curl -sf 10.0.0.10:3473/ping >/dev/null && curl -sf 10.0.0.11:3473/ping >/dev/null; do sleep 2; done
echo "both ready"

Create data and confirm it is persisted in both object storage and Postgres:

curl -s 10.0.0.10:3473/ --data-binary "CREATE TABLE t (a INT);"
curl -s 10.0.0.10:3473/ --data-binary "INSERT INTO t SELECT * FROM generate_series(1, 1000000);"
curl -s 10.0.0.10:3473/ --data-binary "SELECT count(*), sum(a) FROM t;"
aws s3 ls s3://my-firebolt-bucket --recursive                           # tablet objects
psql -d dedicated_pensieve -c "SELECT count(*) FROM log;"               # metadata rows (>0)

Ports to open: between the two Engine nodes, aragog_port (5678), shufflepuff_port (16000), storage_agent_port (3434), and storage_manager_port (1717); the metadata port (7000) from both nodes to the metadata host; Postgres (5432) from the metadata host to its database; and --http-port (3473) from your clients. Clean up:

kill -TERM $PENSIEVE_PID   # plus each node's process ID on its own machine
rm -rf node0 node1 pensieve.xml
psql -d postgres -c "DROP DATABASE IF EXISTS dedicated_pensieve WITH (FORCE);"
psql -d postgres -c "DROP ROLE IF EXISTS pensieve;"
aws s3 rm s3://my-firebolt-bucket --recursive
aws s3 rb s3://my-firebolt-bucket

Multiple engines

Two independent Engines, each with two nodes, sharing one metadata service and one bucket. Each Engine is configured exactly like one multi-node engine, with three values the same on both Engines so they share a catalog:

the same metadata_endpoint, so both reach the same metadata service,
the same bucket_name, so both read and write the same tablets,
the same account. Both Engines get this by omitting instance.id, which defaults to the same value. If you set it explicitly, use the identical ULID on every Engine and as the metadata service’s default_account_id: that value is the account key the service scopes all metadata by, so mismatched values hide each Engine’s tables from the others.

Give each Engine a distinct engine.id. This identifier is informational only: metadata is scoped by account, not by Engine, so two Engines with the same account see each other’s tables regardless of engine.id. Each Engine runs on its own machines, one node per machine:

# Engine A
schema_version: "1.0"
instance:
  type: multi_engine
  multi_engine:
    metadata_endpoint: 10.0.0.5:7000
engine:
  id: engine-a
  nodes:
    - host: 10.0.1.10
    - host: 10.0.1.11
storage:
  type: s3
  bucket_name: my-firebolt-bucket
  aws:
    region: us-east-1

Engine B’s config is identical except for id: engine-b and its two nodes[].host values (10.0.2.10 and 10.0.2.11); metadata_endpoint and bucket_name stay the same, which is what makes the two Engines share one catalog. Bring up each Engine the same way as a single Engine: start its two nodes concurrently and wait for both to report ready. Once both Engines are up, a table created on Engine A is immediately queryable on Engine B, and the reverse:

# on a node of Engine A
curl -s http://10.0.1.10:3473/ --data-binary "CREATE TABLE t (a INT);"
curl -s http://10.0.1.10:3473/ --data-binary "INSERT INTO t VALUES (1),(2),(3);"

# on a node of Engine B, reading Engine A's data
curl -s http://10.0.2.10:3473/ --data-binary "SELECT count(*), sum(a) FROM t;"  # 3  6

Ports to open: the node-to-node ports (5678, 16000, 1717, 3434) within each Engine, plus the metadata port (7000) from every node of both Engines to the metadata host, Postgres (5432) to its database, and each Engine’s --http-port from your clients.

To run both Engines on one machine for local testing, see Colocate multiple nodes on one host. That needs a distinct bind address per Engine and is not a production layout.

​Set up Postgres and the metadata service

​One multi-node engine

​Multiple engines

Set up Postgres and the metadata service

One multi-node engine

Multiple engines