/firebolt-core
working directory. This directory contains the following relevant files and subdirectories.
/firebolt-core/firebolt-core
β The main Firebolt Core executable which is executed as the container entrypoint. The executable accepts a small number of command-line arguments which are documented below./firebolt-core/config.json
β The configuration file which is read by the Firebolt Core executable on startup. The Firebolt Core Docker image contains a default configuration file at this location which is suitable for starting a local single-node setup. You can inject your own configuration file by mounting it at this location (see below)./firebolt-core/volume
β The parent directory for all state which Firebolt Core writes during operation; this is the standard volume mount point used in the Docker Compose and Kubernetes setups; users should either provide a Docker storage mount for the entire /firebolt-core/volume
directory, or exercise more fine-grained control by individually mounting the following subdirectories (see below for details).
/firebolt-core/volume/diagnostic_data
β The directory to which Firebolt Core writes diagnostic information such as log files and crash dumps. Providing a storage mount for this directory makes it easier to inspect this diagnostic information on the host system (see also Troubleshoot Issues). None of the files in this directory capture any database state, so they donβt necessarily have to be persisted./firebolt-core/volume/persistent_data
β The directory to which Firebolt Core writes all data that is required to persist the database state itself, i.e. the schema and contents of the database./firebolt-core/volume/tmp
β The directory to which Firebolt Core writes temporary data. This includes, for example, intermediate query results that operators might have to spill to disk under memory pressure. Even though this information is only temporary, providing a Docker storage mount might still be beneficial (see below for details). None of the files in this directory capture any database state, so they donβt necessarily have to be persisted./firebolt-core/volume/persistent_data
directory of all four Firebolt Core Docker containers needs to mounted in a suitable way.
3473
β The main HTTP query endpoint (see Connect).8122
β HTTP health check endpoint (see Troubleshooting).5678
β Inter-node communication channel for the distributed execution control plane. This channel is used, for example, when the leader node for a specific query wants to schedule distributed work on the follower nodes.16000
β Inter-node communication channel for the distributed execution data plane. This channel is used by the shuffle operators in distributed query plans to exchange intermediate query results between nodes.1717
β Inter-node communication channel for the storage service which keeps track of how the contents of managed tables are sharded across nodes.3434
β Inter-node communication channel for the storage service which keeps track of how the contents of managed tables are sharded across nodes.6500
β Inter-node communication channel for the metadata service which maintains schema information and coordinates transactions.io_uring
kernel API for efficient network and disk I/O. This kernel API is blocked by the default Docker seccomp profile and needs to be explicitly allowed when starting a Firebolt Core Docker container. There are two main ways in which this can be achieved.
--security-opt seccomp=unconfined
to disable seccomp
confinement entirely.seccomp
profile which extends the default Docker profile to additionally allow the io_uring_setup
, io_uring_enter
, and io_uring_register
syscalls. In this case, the Docker container should be started with --security-opt seccomp=/path/to/custom/profile.json
.io_uring
to pin certain buffers in main memory to eliminate unnecessary copies at the boundary between kernel-space and user-space. The size of these buffers counts against the RLIMIT_MEMLOCK
resource limit, which is usually set to a comparatively low default value when starting Docker containers. Therefore, we recommend to explicitly specify --ulimit memlock=8589934592:8589934592
when starting Firebolt Core Docker containers to avoid errors during startup.
/firebolt-core/config.json
file within the Docker container, for example by using a Docker storage mount to mount a configuration file from the host operating system into the Docker container (see also the Docker Compose and Kubernetes deployment guides).
The configuration file itself should be a JSON file with the following contents.
nodes
array corresponds to one node in the desired Firebolt Core cluster and specifies the hostname or IP address under which it is reachable from the other nodes. Usually, it is preferable to specify hostnames and rely on DNS (e.g. provided by Docker) to resolve them to IP addresses.
The same configuration file should be passed to each Firebolt Core node within a cluster. The --node
command-line argument (see below) is used to specify which of the entries in the nodes
array corresponds to the βcurrentβ node, and which ones are remote nodes.
Ensure that any hostnames specified in the nodes array are resolvable to the correct IP addresses by all other nodes in the cluster. In Docker environments, this can often be achieved using custom Docker networks and container hostnames.
--node=NODE_INDEX
β Specify the index of the current node within a multi-node cluster, defaults to zero. This index identifies the entry within the nodes
array of the configuration file which corresponds to the current node. If the configuration file specifies N
nodes, each of the node indexes from 0
to N - 1
should be assigned to exactly one of the nodes within the cluster.--version
β Print the Firebolt Core version number and exit./firebolt-core/config.json
for each node of a multi-node Firebolt Core cluster. This file is read by the Firebolt Core executable exactly once during startup to determine the cluster topology. Therefore, the way in which this file is mounted is not performance-critial.
The situation is different for the /firebolt-core/volume/persistent_data
directory. Depending on the specific workload Firebolt Core reads and writes potentially large amounts of data to and from this directory. More specifically, the following data is written to /firebolt-core/volume/persistent_data
.
/firebolt-core/volume/persistent_data/metadata_storage
./firebolt-core/volume/persistent_data/layered_space
and /firebolt-core/volume/persistent_data/local_space
, in a proprietary format which can currently only be read by Firebolt Core engines./firebolt-core/volume/tmp
directory even if your workload does not involve any schema objects managed by Firebolt Core. This can happen if query plan operators such as joins or aggregations spill intermediate results to disk in order to avoid running out of working memory. If you observe that your workload causes a lot of queries to spill, it might be beneficial to provide an SSD-backed storage mount with high I/O throughput (at least 2 GB/s) for the temporary data directory as well. Alternatively, you can increase the amount of available main memory to avoid spilling entirely. In either case, the any files written to the temporary data directory do not have to be persisted between container restarts. Firebolt Core also actively removes files that are no longer needed from this directory.
Finally, Firebolt Core writes some diagnostic information to the /firebolt-core/volume/diagnostic_data
directory. Files in this directory do not necessarily have to be persistent between container restarts, but can be useful for troubleshooting.
/firebolt-core/volume/diagnostic_data/logs
./firebolt-core/volume/diagnostic_data/crash
.docker run
command serve the following purpose.
--tty
β Allocate a pseudo-tty for the container. This causes Firebolt Core to write log messages to this tty in addition to the log files in /firebolt-core/volume/diagnostic_data/logs
; these logs are necessary when troubleshooting and/or submitting a bug report.--rm
β Remove the container when it is stopped. This is done to avoid cluttering the local system.--security-opt seccomp=unconfined
β Disable seccomp
confinement so that Firebolt Core can use io_uring
.--ulimit memlock=8589934592:8589934592
β Increase the RLIMIT_MEMLOCK
resource limit to 8 GB so that Firebolt Core can pin I/O buffers in memory.--publish 127.0.0.1:3473:3473
β Publish the container port 3473, on which the HTTP endpoint is exposed, to port 3473 on localhost. Since this is a single-node deployment, we donβt need to publish the ports for inter-node communication.ghcr.io/firebolt-db/firebolt-core:preview-rc
β The Docker image from which to start the container.--rm
command line argument to the docker run
command, the container is immediately removed when it is stopped and any database state is lost. It would of course be possible to simply omit --rm
, in which case the database state will persist across container restarts. However, this is rather inflexible since all state is still tied to this specific container. A more flexible approach is to specify a storage mount as follows.
--mount type=bind,src=/firebolt-core-data,dst=/firebolt-core/volume
β Mount the /firebolt-core-data
directory on the host system at /firebolt-core/volume
in the container. Any persistent data written by Firebolt Core will now end up in /firebolt-core-data
on the host system.config.json
file, listing all nodes in the cluster with their respective hostnames or IP addresses, must be mounted or made available at /firebolt-core/config.json
within each container.--node=NODE_INDEX
argument. This index tells the Firebolt Core executable which entry in the nodes
array of the config.json
file corresponds to itself.5678
, 16000
, 1717
, 3434
, 6500
). This usually means configuring them on a shared network./firebolt-core/volume
directory. Shared storage for this directory across multiple nodes is not supported.io_uring
enablement (via seccomp
profile) and RLIMIT_MEMLOCK
increase (--ulimit memlock
) must be applied to each node in the cluster.3473
) might be published externally for client access, potentially from one or more designated query nodes. Inter-node communication ports are generally not exposed externally but used within the cluster network.