Troubleshoot
Firebolt Core provides a number of tools to help you troubleshoot issues. In case these tools are not sufficient to resolve an issue, there are also the following ways for you to get in touch with Firebolt engineers.
- GitHub Discussions in the Firebolt Core repository for open-ended questions and general support requests.
- GitHub Issues in the Firebolt Core repository for bug reports. Please make sure follow the issue template and provide as much of the requested information as possible.
For security issues, please refer to our security policy.
Health Checks
When a Firebolt Core node is started, it performs various health checks to validate that the entire cluster is in a healthy state. Until these health checks have succeeded, queries sent to a node will always return an error with a message describing the specific problem. Similarly, if a node encounters an error that prevents it from properly starting in the first place, it will still expose the HTTP endpoint but return errors for all submitted queries.
More specifically, a node goes through the following steps during startup.
- The configuration file is read and all internal services are initialized and started. Errors in this step are non-recoverable and the node will always return errors for any queries submitted to its HTTP endpoint.
Subsequently, a node waits for all of the following health checks to succeed.
- The node must have connectivity to all remote nodes on all inter-node communication channels (see Deployment and Operational Guide). This is verified by attempting to establish a TCP connection to each inter-node communication port of each remote node. If this succeeds we assume that connectivity is good and immediately close the connection again.
- All nodes must run the same Firebolt Core version. This is verified by sending a
SELECT version()
query to each remote node and validating that the result matches the version running on the local node. - All nodes must have the exact same list of hosts in their
config.json
file. This is verified by sending a query to each remote node that returns a hash of the hosts and validating that the result matches the hash on the local node. - All relevant filesystem paths within the Docker container must be readable & writeable (see Deployment and Operational Guide). This is intended to guard against some errors that could arise from storage mounts with incorrect permissions.
Once all of the above checks have succeeded on a node, it will start accepting queries. Note that these health checks run locally on each node, and there is no guarantee that they will succeed at the same time on all nodes. Therefore, clients should usually explicitly wait until all nodes are ready to accept queries.
Liveness & Readiness Probes
Firebolt Core nodes expose a dedicated HTTP endpoint for health checks on port 8122
(see also Deployment and Operational Guide). This endpoint responds on the following two routes.
/health/live
— Accepts GET or HEAD requests and always returns an empty HTTP 200 response. This endpoint can be queried to determine whether a Firebolt Core node has sucessfully been started./health/ready
— Accepts GET or HEAD requests. Returns an empty HTTP 500 response if and only if the above health checks have not yet completed successfully. Returns an empty HTTP 200 response if and only if the above health checks have succeeded. This endpoint can be queried to determine whether a Firebolt Core node is healthy and ready to accept queries.
Query Plans
The primary tool to troubleshoot query performance is the EXPLAIN statement.
Log Messages
In some cases, inspecting the log messages written by Firebolt Core might provide more information about an issue (see Deployment and Operational Guide). If you are reporting an issue in the Firebolt Core repository, you should always attach the relevant log output of all Firebolt Core nodes to the report if possible. Be aware, however, that log messages might contain potentially sensitive information.
Log messages can contain stack traces that do not expose any information about source locations, for example:
While these might not look particularly useful at first glance, they should nevertheless be included when reporting an issue. Firebolt engineers can re-symbolize such stack traces which provides valuable information for debugging a problem.
Crash Dumps
In the rare event of a Firebolt Core node crashing, Firebolt Core will attempt to write a minidump capturing the internal state of the Firebolt Core process at the time of the crash. This minidump will be written to a file in the /firebolt-core/volume/diagnostic_data/crash
directory within the Docker container (see Deployment and Operational Guide), and the log should contain a message identifying the precise file name. Similar to the log messages written by Firebolt Core, providing a minidump alongside a crash report provides valuable additional information. However, a minidump might also contain potentially sensitive information.