Getting started tutorial
This tutorial teaches you how to create a database, ingest a sample dataset from Amazon S3 into Firebolt, and run fundamental analytics queries over the data. To perform this tutorial, you need an active Firebolt account. If you don’t have one, Schedule a call to get set up.
Create your first database
To start working with data, you first create a database and a Firebolt engine. An engine represents the compute resources that are attached to a database for a certain workload. A database always has one general purpose engine that can write to the Firebolt file format (F3) for data ingestion and run analytics queries. We use that single-engine set up in this tutorial. Many databases are set up with additional analytics engines that can only query, and are configured to optimize different query workloads. For more information, see Working-with-engines.
To create a database and engine
-
From the Databases page, choose New Database.
-
Enter a Database name (we use Tutorial in this topic) and leave
us-east-1
selected as the Database region. -
Under Database engines, leave the default engine selected. Firebolt will name the engine Tutorial_Ingest.
-
Choose Create database.
Firebolt adds your database to the Databases page.
Run your first query
Before we ingest the sample data and run a query over it, we’ll go to the SQL workspace for the database and run a simple query to demonstrate how to start an engine. For more information about the SQL workspace, see Using the SQL workspace.
To open your database, start an engine, and run a query
-
From the Database page, find the database that you created in the list, and then choose the Open in SQL workspace icon (>_) next to the Database name.
- In the Script 1 tab, type the simple query below that returns a list of databases in your account.
SHOW DATABASES;
-
Choose Run Script and note that the Using list indicates the engine that Firebolt uses to run the script, for example,
Tutorial_Ingest
.
- When Firebolt prompts you to start the engine, choose Start Engine.
Ingest data
Ingesting data into Firebolt is a three-step process. You:
-
Create an external table.
-
Create a fact or dimension table.
-
Run an
INSERT INTO
command from the external table to the fact or dimension table.
Create an external table
An external table is a special, virtual table that serves as a connector to your data source. After the external table is created, you ingest data by running an INSERT INTO
command from that external table into a fact table or dimension table. The INSERT INTO
command must run on a general purpose engine. After the data is available in fact and dimension tables, you can run analytics queries over those tables using any engine. Although it’s possible, we don’t recommend running analytics queries on external tables. For more information, see CREATE EXTERNAL TABLE and Using AWS roles to access S3.
To create an external table
-
Choose the plus symbol (+) next to Script 1 to create a new script tab, Script 2, in the SQL workspace.
- Copy and paste the query below into the Script 2 tab.
CREATE EXTERNAL TABLE IF NOT EXISTS ex_lineitem ( -- Column definitions map to data fields -- in the source data file and are specified -- as sources in the INSERT INTO statement for ingestion. l_orderkey BIGINT, l_partkey BIGINT, l_suppkey BIGINT, l_linenumber INTEGER, l_quantity BIGINT, l_extendedprice BIGINT, l_discount BIGINT, l_tax BIGINT, l_returnflag TEXT, l_linestatus TEXT, l_shipdate TEXT, l_commitdate TEXT, l_receiptdate TEXT, l_shipinstruct TEXT, l_shipmode TEXT, l_comment TEXT ) -- The URL specifies the data source in S3. -- All files in the location that match the OBJECT_PATTERN -- will be processed during ingestion. URL = 's3://firebolt-publishing-public/samples/tpc-h/parquet/lineitem/' -- These credentials specify a role or AWS key credentials -- with permission to read from the S3 location. -- These credentials are commented out for this tutorial because the bucket -- is publicly accessible. -- CREDENTIALS = ( AWS_KEY_ID = '******' AWS_SECRET_KEY = '******' ) OBJECT_PATTERN = '*.parquet' TYPE = (PARQUET);
-
Choose Run Script.
Firebolt creates the external table. When finished, the external tableex_lineitem
appears on the object panel of the database.
- Choose the vertical ellipses next to Script 2, choose Save script, enter a name (for example, MyExTableScript) and then press ENTER to save the script.
Create a fact table
In this step, you create a Firebolt fact table called lineitem
, which you use in the next step as the target for an INSERT INTO
command.
Every fact table in Firebolt must have a primary index specified when you create it. Firebolt uses the primary index when it ingests data so that it is saved to S3 for highly efficient pruning and sorting when the data is queried. For more information, see Using primary indexes. The fact table that we create in this step specifies the l_orderkey
and l_linenumber
columns for the primary index. For more information about choosing columns for a primary index, see How to choose primary index columns.
To create a fact table
-
Create a new script tab.
- Copy and paste the query below into the script tab.
CREATE FACT TABLE IF NOT EXISTS lineitem ( -- In this example, these fact table columns -- map directly to the external table columns. l_orderkey BIGINT, l_partkey BIGINT, l_suppkey BIGINT, l_linenumber INTEGER, l_quantity BIGINT, l_extendedprice BIGINT, l_discount BIGINT, l_tax BIGINT, l_returnflag TEXT, l_linestatus TEXT, l_shipdate TEXT, l_commitdate TEXT, l_receiptdate TEXT, l_shipinstruct TEXT, l_shipmode TEXT, l_comment TEXT ) PRIMARY INDEX l_orderkey, l_linenumber;
- Choose Run Script.
Firebolt creates the fact table. When finished, the tablelineitem
appears on the object panel of the database.
Use INSERT INTO to ingest data
You can now use the INSERT INTO
command to copy the data from the external table into the fact table. During the INSERT INTO operation, Firebolt ingests the data from your source into Firebolt.
Use source_file_name
in the WHERE
clause to specify which records to load from Amazon S3 and improve the performance of the read from S3.
To run an INSERT INTO
command that ingests data
- Create a new script tab.
- Copy and paste the query below into the script tab.
INSERT INTO lineitem SELECT * FROM ex_lineitem;
- Choose Run Script.
The query results pane indicates a Status of Running as shown below.
The Status changes to Success when the ingestion is complete as shown below.
Query the ingested data
Now that the data has been ingested into the lineitem
table, you can run analytics queries over the table that benefit from the speed and efficiency of Firebolt.
To verify that you inserted the data into the table, run a simple SELECT
query like the one below.
SELECT
*
FROM
lineitem
LIMIT
1000;
The values shown in the query results pane should be similar to those shown below.
Configure an aggregating index
An aggregating index enables you to take a subset of table columns and predefine dimensions and measures to aggregate. Many aggregations are supported—from SUM
, MAX
, and MIN
to more complex aggregations such as COUNT
and COUNT(DISTINCT)
. At query runtime, instead of calculating the aggregation on the entire table and scanning all rows, Firebolt uses the pre-calculated values in the aggregating index. For more information, see Using aggregating indexes.
From the lineitem
fact table that you created in the previous step, assume you typically run queries to look at the SUM(l_quantity)
, SUM(l_extendedprice)
, and AVG(l_discount)
, grouped by different combinations of l_suppkey
and l_partkey
. You can create an aggregating index to speed up these queries by running the statement below.
CREATE AGGREGATING INDEX
agg_lineitem
ON lineitem (
l_suppkey,
l_partkey,
SUM(l_quantity),
SUM(l_extendedprice),
AVG(l_discount)
);
After you run the script, you see the agg_lineitem
index listed in the object pane. Any queries that run over the lineitem
table that combine any of these fields and aggregations defined in the index will now use the index instead of reading the entire table.