Load data
You can load data into Firebolt from an Amazon S3 bucket using two different workflows.
If you want to get started quickly, load data using a wizard in the Firebolt Workspace. If you want a more customized experience, you can write SQL scripts to handle each part of your workflow. This guide shows you how to load data using both the wizard and SQL, and some common data loading workflows and errors.
Before you can load data, you must first register with Firebolt, then create a database and an engine. For information about how to register, see Get Started. See the following sections for information about how to create a database and engine.
Load data using a wizard
You can use the Load data wizard in the Firebolt Workspace to load data in either CSV or Parquet format, and choose from a variety of different loading parameters which include the following:
- Specifying a custom delimiter, quote character, escape character, and other options.
- How to handle errors during data load.
- Specifying a primary index.
The Load data wizard guides you through the process of creating an engine and database as part of the loading process.
See Load data using a wizard for information about the options available in the Load data wizard.
Load data using SQL
You can use SQL to load data in CSV, Parquet, TSV, AVRO, JSON Lines or ORC formats. Prior to loading data, you must also create a database and engine using either of the following options:
-
Use buttons in the Firebolt Workspace to create a database and engine. For more information, see the Create a Database and Create an Engine sections in the Get Started using SQL guide.
-
Use the SQL commands CREATE DATABASE and CREATE ENGINE.
See SQL to load data for information and code examples to load data using SQL.
Optimizing during data loading
Optimizing your workflow for Firebolt starts when you load your data. Use the following guidance:
-
A primary index is a sparse index that uniquely identifies rows in a table. Having a primary index is critical to query performance at Firebolt because it allows a query to locate data without scanning an entire dataset. If you are familiar with your data and query history well enough to select an optimal primary index, you can define it when creating a table. If you don’t, you can still load your data without a primary index. Then, once you know your query history patterns, you must create a new table in order to define a primary index.
You can specify primary indexes in either the Load data wizard or inside SQL commands. The Load data using a wizard guide discusses considerations for selecting and how to select primary indexes. The Load data using SQL discusses considerations for selecting and shows code examples that select primary indexes. For more advanced information, see Primary indexes.
-
If you intend to use aggregate functions in queries, you can calculate an aggregating index when loading your data. Then queries use these pre-calculated values to access information quickly. For an example of calculating an aggregating index during load, see Load data using SQL. For an introduction to aggregating indexes, see the Aggregating indexes section of the Get Started guide. For more advanced information, see Aggregating indexes.
Next steps
After you load your data, you can start running and optimizing your queries. A typical workflow has the previous steps followed by data and resource cleanup as shown in the following diagram: