This tutorial teaches you how to create a database, ingest a sample dataset from Amazon S3 into Firebolt, and run fundamental analytics queries over the data. To perform this tutorial, you need an active Firebolt account. If you don’t have one, Schedule a call to get set up, and register for our hands-on Firebolt workshop to get an interactive, instructor-led tutorial on Firebolt.
This tutorial uses Firebolt’s sample dataset, from the fictional gaming company “Ultra Fast Gaming Inc.” This dataset is publicly available with the access credentials shared below.
- Create your first database
- Run your first query
- Ingest data
- Query the ingested data
To start working with data, you first create a database and a Firebolt engine. An engine represents the compute resources that are attached to a database for a certain workload. A database always has one general purpose engine that can write to the Firebolt file format (F3) for data ingestion and run analytics queries. We use that single-engine set up in this tutorial. Many databases are set up with additional analytics engines that can only query, and are configured to optimize different query workloads. For more information, see Working-with-engines.
To create a database and engine
From the Databases page, choose New Database.
Enter a Database name (we use Tutorial in this topic) and leave
us-east-1selected as the Database region.
Under Database engines, leave the default engine selected. Firebolt will name the engine Tutorial_general_purpose.
Choose Create database.
Firebolt adds your database to the Databases page.
Before we ingest the sample data and run a query over it, we’ll go to the SQL workspace for the database and run a simple query to demonstrate how to start an engine. For more information about the SQL workspace, see Using the SQL workspace.
To open your database, start an engine, and run a query
From the Database page, find the database that you created in the list, and then choose the Open in SQL workspace icon (>_) next to the Database name.
- In the Script 1 tab, type the simple query below that returns a list of databases in your account.
Choose Run Script and note that the Using list indicates the engine that Firebolt uses to run the script, for example,
- When Firebolt prompts you to start the engine, choose Start Engine. The engine will take a few minutes to set up.
Ingesting data into Firebolt is a three-step process. You:
Create an external table.
Create a fact or dimension table.
INSERT INTOcommand from the external table to the fact or dimension table.
An external table is a special, virtual table that serves as a connector to your data source. After the external table is created, you ingest data by running an
INSERT INTO command from that external table into a fact table or dimension table. The
INSERT INTO command must run on a general purpose engine. After the data is available in fact and dimension tables, you can run analytics queries over those tables using any engine. Although it’s possible, we don’t recommend running analytics queries on external tables. For more information, see CREATE EXTERNAL TABLE and Using AWS roles to access S3.
To create an external table
Choose the plus symbol (+) next to Script 1 to create a new script tab, Script 2, in the SQL workspace.
- Copy and paste the query below into the Script 2 tab.
CREATE EXTERNAL TABLE IF NOT EXISTS ex_levels ( -- Column definitions map to data fields -- in the source data file and are specified -- as sources in the INSERT INTO statement for ingestion. LevelID INTEGER, GameID INTEGER, Level INTEGER, Name TEXT, LevelType TEXT, NextLevel INTEGER NULL, MinPointsToPass INTEGER, MaxPoints INTEGER, NumberOfLaps INTEGER, MaxPlayers INTEGER, MinPlayers INTEGER, PointsPerLap REAL, MusicTrack TEXT, SceneDetails TEXT, MaxPlayTimeSeconds INTEGER, LevelIcon TEXT ) -- The URL specifies the data source in S3. -- All files in the location that match the OBJECT_PATTERN -- will be processed during ingestion. URL = 's3://firebolt-publishing-public/help_center_assets/firebolt_sample_dataset/' -- These credentials specify a role or AWS key credentials -- with permission to read from the S3 location. -- These credentials are commented out for this tutorial because the bucket -- is publicly accessible. -- CREDENTIALS = ( AWS_KEY_ID = '******' AWS_SECRET_KEY = '******' ) OBJECT_PATTERN = 'help_center_assets/firebolt_sample_dataset/levels.csv' TYPE = (CSV SKIP_HEADER_ROWS = 1);
Choose Run Script.
Firebolt creates the external table. When finished, the external table
ex_levelsappears on the object panel of the database.
- Choose the vertical ellipses next to Script 2, choose Save script, enter a name (for example, MyExTableScript) and then press ENTER to save the script.
In this step, you’ll create a Firebolt fact table called
levels, which you use in the next step as the target for an
INSERT INTO command.
When creating a fact or dimension table, you will specify a primary index. Firebolt uses the primary index when it ingests data so that it is saved to S3 for highly efficient pruning and sorting when the data is queried. A primary index is required when creating a fact table, and recommended for dimension tables. For more information, see Using primary indexes.
The fact table that we create in this step specifies the
LevelID column for the primary index. For more information about choosing columns for a primary index, see How to choose primary index columns.
To create a fact table
Create a new script tab.
- Copy and paste the query below into the script tab.
CREATE FACT TABLE IF NOT EXISTS levels ( LevelID INTEGER, GameID INTEGER, Level INTEGER, Name TEXT, LevelType TEXT, NextLevel INTEGER NULL, MinPointsToPass INTEGER, MaxPoints INTEGER, NumberOfLaps INTEGER, MaxPlayers INTEGER, MinPlayers INTEGER, PointsPerLap REAL, MusicTrack TEXT, SceneDetails TEXT, MaxPlayTimeSeconds INTEGER, LevelIcon BYTEA, SOURCE_FILE_NAME TEXT, SOURCE_FILE_TIMESTAMP TIMESTAMP ) PRIMARY INDEX LevelID;
- Choose Run Script.
Firebolt creates the fact table. When finished, the table
levelsappears on the object panel of the database.
You can now use the
INSERT INTO command to copy the data from the external table into the fact table. During this operation, Firebolt ingests the data from your source into Firebolt.
source_file_name in the
WHERE clause to specify which records to load from Amazon S3 and improve the performance of the read from S3.
To run an
INSERT INTO command that ingests data
- Create a new script tab.
- Copy and paste the query below into the script tab.
INSERT INTO levels SELECT LevelID, GameID, Level, Name, LevelType, NextLevel, MinPointsToPass, MaxPoints, NumberOfLaps, MaxPlayers, MinPlayers, PointsPerLap, MusicTrack, SceneDetails, MaxPlayTimeSeconds, DECODE(REPLACE(LevelIcon,'"',''),'BASE64'), SOURCE_FILE_NAME, SOURCE_FILE_TIMESTAMP FROM ex_levels WHERE SOURCE_FILE_TIMESTAMP > (SELECT COALESCE(MAX(SOURCE_FILE_TIMESTAMP), '1980-01-01'::TIMESTAMP) FROM levels);
- Choose Run Script.
The query results pane indicates a Status of Running as shown below.
The Status changes to Success when the ingestion is complete as shown below.
Now that the data has been ingested into the
levels table, you can run analytics queries over the table that benefit from the speed and efficiency of Firebolt.
To verify that you inserted the data into the table, run a simple
SELECT query like the one below.
The values shown in the query results pane should be similar to those shown below.
An aggregating index enables you to take a subset of table columns and predefine dimensions and measures to aggregate. Many aggregations are supported—from
MIN to more complex aggregations such as
COUNT(DISTINCT). At query runtime, instead of calculating the aggregation on the entire table and scanning all rows, Firebolt uses the pre-calculated values in the aggregating index. For more information, see Using aggregating indexes.
levels fact table that you created in the previous step, assume you want to run a query to look at the
AVG(NumberOfLaps), grouped by
LevelType. You can create an aggregating index to speed up these queries by running the statement below.
CREATE AGGREGATING INDEX
ON levels (
After you run the script, you see the
levels_agg_idx index listed in the object pane. Any queries that run over the
levels table that combine any of these fields and aggregations defined in the index will now use the index instead of reading the entire table.