Task | Expected Usage |
---|---|
Ingest initial data | 4-16 FBU |
Run test queries | 8-32 FBU |
Find optimal query performance | 32-240 FBU |
Find optimal test integrations | 32-240 FBU |
WHERE
, JOIN
, GROUP_BY
, and clauses used for sorting. In Firebolt, a primary index is a type of sparse index. Thus, selecting the best primary index can reduce query run times significantly by reducing the data set that the query searches over. Selecting primary indexes also allows Firebolt to manage updates, deletions and insertions to tables and provide optimal query performance.
If you have a composite primary index, the order that the columns are listed is important. Specify the column that has a large number of unique values, or high cardinality, first, followed by columns with lower cardinality. A sort order with the previous characteristics allows Firebolt to prune, or eliminate irrelevant data, so that it doesn’t have to scan it in query processing. Pruning significantly enhances query performance.
You can create a primary index only when you create a table. If you want to change the primary index, you must create a new table. The following example shows how to use CREATE TABLE to create a new levels
table, define the schema, and set two primary indexes:
LevelID
, is required in order to create a primary index. The second value, Name
, and any following values are optional. Firebolt will use all listed primary indexes to optimize query scans. If Name has lower cardinality than LevelID
, then Firebolt can optimize these indexes to eliminate scanning over irrelevant data. For more information about primary indexes and sort order, see Primary index.
To read data into the levels
table, enter the following into a new script tab:
COUNT
, SUM
, MAX
, MIN
, AVG
, JOIN
, and GROUP BY
. Rather than computing aggregate values each time they are used in a calculation, the results are accessed from storage, which helps run queries quickly and saves compute resources.
An aggregating index combines columns into a statistical result. You can calculate an aggregate index on an entire table, or more efficiently, calculate them over a subset of table columns. You can also use your knowledge of which dimensions and aggregate functions are used most often for your use case to predefine what table dimensions and which aggregate functions to use.
Once you create aggregate indexes, Firebolt maintains them automatically for you. If you load new data into your table or alter it, your aggregate indexes are automatically updated. You can also have multiple aggregate indexes for a single table. When you query a table with multiple aggregate indexes, Firebolt will automatically select the best index to use to optimize performance.
From the tutorial table that you created in the previous step, assume you want to run a query to look at the AVG(NumberOfLaps), grouped by LevelType. The following example code shows you how to create an aggregating index levels_agg_idx on the LevelType column to pre-calculate the average number of laps for each level.
levels_agg_idx
aggregate index listed in the left navigation pane under Indexes in the tutorial table. Any queries that run over the tutorial table that use an average of the NumberOfLaps column grouped by LevelType will now use the levels_agg_idx
index instead of reading the entire table to calculate it.
For more information, see Aggregating index.
CHECKSUM
to warm the entire table as follows:
COPY TO
to export data to an AWS S3 bucket. You may have to reach out to your administrator to obtain or change AWS permissions.
<aws_access_key_id>
is the AWS access key ID associated with an AWS user or an IAM role. An access key ID has the following form: AKIAIOSFODNN7EXAMPLE
. The value <aws_secret_access_key>
is the AWS secret access key. A secret access key has the following form: wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY
.
COPY TO
, the file or files will be written in the default CSV format. Because COMPRESSION
is also omitted, the output data is compressed using GZIP (*.csv.gz) format.
Firebolt assigns a query ID, that has the following example format 16B903C4206098FD
, to the query at runtime. If the size of the compressed output exceeds the default of 16
MB, Firebolt writes multiple GZIP files. In the following example, the size of the output is 40
MB, so Firebolt writes 4
files.