Connect
Each node in a Firebolt Core cluster exposes a single HTTP endpoint for communication with clients. Clients interact with a cluster by submitting HTTP POST
requests containing plain SQL queries to these endpoints. The cluster then executes the submitted SQL queries and returns query results to the client in the corresponding HTTP response.
Communication Protocol
In principle, any HTTP client (library) can be used to connect to a Firebolt Core cluster. In the following examples we will use the curl
command line utility and assume that a single-node Firebolt Core cluster is running on the local machine with its HTTP endpoint exposed at localhost:3473
. You can refer to Get Started for instructions on how to start such a cluster.
Each HTTP POST
request submitted to the HTTP endpoint of Firebolt Core should contain exactly one valid SQL query, for example:
If necessary, system settings can be adjusted for individual queries by appending them as parameters to the HTTP query string. For example, the timezone
setting can be adjusted as follows.
Multiple parameters can be specified by separating them with an ampersand (&
).
Query results are returned in the body of the corresponding HTTP response. By default, they are formatted as a human-readable tab-separated string. For example, the following request
will return the following response body.
Output Format
The output_format
parameter can be specified in the HTTP query string to select a different output format. We recommend using the JSON_Compact
output format whenever query results need to be processed by code instead of humans. The supported formats are:
TabSeparatedWithNamesAndTypes
(default)JSON_Compact
JSON_CompactLimited
(same as JSON_Compact but only returns 10,000 rows)JSONLines_Compact
(chunked version of JSON_CompactLimited)
For example, the following request using JSON_Compact
will return the following response body.
For example, the following request using JSONLines_Compact
will return the following response body.
Sessions
Firebolt Core does not have any concept of server-side sessions, i.e. the communication protocol is stateless. This is irrelevant for most types of SQL statements clients might submit to a Firebolt Core cluster, but there are a few exceptions. Specifically, the following statement types require special handling on the client side.
USE DATABASE. Running this statement only validates that the specified database exists. If this is successful, Firebolt Core returns the Firebolt-Update-Parameters
response header containing the new parameters that should be appended to the HTTP query string for subsequent statements to use the changed database. Consider the following example running against a clean Firebolt Core cluster (see Get Started on how to start such a cluster).
Transactions
Firebolt Core is fully transactional (see also Architecture). All queries submitted to a Firebolt Core cluster run within their own transaction that is automatically committed when the statement finishes, and rolled back when the statement fails. There can be an arbitrary number of concurrent read transactions, but only one write transaction can be active at any point in time within the entire cluster. Attempting to submit a write transaction while another write transaction is already active will result in an error. This means that write transactions have to be serialized on the client side, for example through a suitable queuing mechanism that submits them one-by-one.
Load Balancing & Data Locality
Firebolt Core contains the same distributed execution engine that also powers Firebolt’s managed Cloud data warehouse. In general, this execution engine distributes most query processing work evenly across all nodes within a cluster. However, clients should nevertheless ensure that they submit queries to all nodes within a cluster in a balanced way for the following reasons.
First, the node to which a query is submitted becomes the leader node for that query and usually has to do slightly more work than the follower nodes. Most importantly, all query planning is done on the leader node which may require substantial compute resources for complex queries. Furthermore, the final query result is gathered on the leader node and streamed back to the client from there. If the query result is large, this may also require substantial compute resources. Finally, some stages of the distribution execution plan may run only on the leader node. For example, this is the case for the final sorting step for queries with an ORDER BY
clause, or for queries that do not access any tables at all.
Second, this behavior also affects data placement when inserting into tables managed by the Firebolt Core cluster. Specifically, such tables are conceptually sharded across all nodes in a cluster but data is physically inserted in the final stage of the distributed query plan. If this stage runs only on a single node, all inserted data will also end up on this single node. In this case, balancing individual insert statements across nodes is crucial to ensure that data is distributed properly across nodes.
Whether the above applies to a specific query can be determined by inspecting its physical query plan (see also Troubleshooting). Consider the following example running against a two-node Firebolt Core cluster with node endpoints exposed at localhost:3473
and localhost:3474
(see Get Started). Note that running these queries against a single-node cluster would result in slightly different query plans.
Security
Firebolt Core does not have any built-in provisions to authenticate and secure communication with clients. In particular, the following caveats apply:
- All communication with clients happens over unencrypted HTTP. If any actor is able to intercept communication between Firebolt Core and a client they will be able to read all information exchanged between the client and Firebolt Core. This includes potentially sensitive information such as secrets used to access external resources like S3 or GCS.
- Firebolt Core does not authenticate any requests made to its HTTP endpoints, and there is no role-based access control. In other words, any actor that is able to send requests to a Firebolt Core HTTP endpoint has full access to the cluster and can read or modify any information stored on the cluster.
Client SDKs
Firebolt offers a number of client SDKs for its managed Cloud data warehouse. However, most of these SDKs cannot yet be used to connect to Firebolt Core. We will gradually extend SDK coverage as Firebolt Core moves beyond public preview (see also Roadmap). The following client SDKs are currently supported.
- The Firebolt JDBC driver from version 3.6.3. See Connect with JDBC for details.