

- Architectural differences – An overview of how to adapt your Redshift workflow to Firebolt’s decoupled compute-storage model, which provides elastic scaling, workload isolation, and advanced indexing.
- Compute and storage – Scale compute resources for different workloads without resizing clusters or redistributing data.
- Scalability and elasticity – Use right-sized engines for ingestion, analytics, and ETL without downtime.
- Workload isolation – Assign dedicated engines to individual workloads to avoid resource contention and improve reliability.
- Cost model – Control costs with pay-as-you-go billing and flexible engine usage. Pause idle engines and size compute resources according to workload demand.
- Schema differences – An overview of how to adapt your schema from Redshift’s
SORT
andDIST
keys to Firebolt’s dynamic indexing, denormalization, and flexible JSON handling, which replace manual distribution keys and rigid schemas for faster, simpler queries.- Table design – Use fact tables for large datasets and dimension tables for reference data. Optimize queries with columnar storage and primary indexes.
- Indexing – Replace manual
SORTKEY
andDISTKEY
tuning with dynamic primary and aggregating indexes for faster query performance and automatic data pruning. - Denormalization – Reduce joins and query complexity by storing frequently used fields directly in fact tables.
- Handling JSON – Store JSON as
TEXT
and query nested fields using JSON functions without rigid schemas. - Data types – Convert Redshift data types to Firebolt equivalents. Use
TEXT
for strings and JSON data, and ensure numeric precision and date compatibility.
- Export data from Redshift – Learn how to efficiently export your Redshift data to Amazon S3 in CSV or Parquet format, format it for Firebolt, and organize it for optimal parallel loading.
- Load data into Firebolt – Load exported data into Firebolt using the COPY FROM command, using parallel ingestion and indexing to optimize loading speed and query performance.
- Translate queries – Use indexing, denormalization, and JSON functions to improve query efficiency and reduce complexity when migrating to Firebolt.
- Performance testing and optimization – Validate data consistency, test query performance, adjust engine configurations, and fine-tune indexes to maximize speed and resource efficiency.
- Automate migration – Set up an automated pipeline using tools like Airflow or dbt to streamline data migration, schema deployment, and query adaptation.
Architectural differences
Migrating from Amazon Redshift to Firebolt requires adapting to a different architecture. Redshift’s monolithic cluster model, where compute and storage are tightly integrated, limits flexibility and forces full-cluster scaling. In contrast, Firebolt separates compute engines from storage, enabling elastic scaling, workload isolation, and advanced indexing for faster performance, as summarized in the following table:Feature | Redshift | Firebolt | Recommendations |
---|---|---|---|
Compute and storage | Monolithic cluster with tightly coupled compute and storage. | Decoupled architecture with virtualized compute engines and separate storage. | Use multiple engines to separate workloads and improve flexibility. |
Scalability and elasticity | Resizing requires downtime and offers limited concurrency scaling. | Elastic scaling with independent compute and storage. | Assign separate engines for ingestion, analytics, and ETL to optimize resource use. |
Workload isolation | Shared cluster causes resource contention. | Fully isolated workloads with multiple independent engines. | Use dedicated engines to prevent interference between queries. |
Cost model | Billed for cluster uptime, with additional charges for concurrency scaling. | Pay-as-you-go pricing based on engine runtime. | Pause idle engines and right-size compute to control costs. |

Compute and storage

TYPE
is critical to optimizing performance and cost efficiency. Unlike Redshift, where all workloads run on the same cluster, Firebolt lets you assign separate engines for ingestion, analytics, and transformation tasks. To maximize efficiency, engines should be right-sized based on workload needs, scaled independently of storage, and paused when not in use to reduce costs.
Best practice for compute and storage | Impact | How to implement |
---|---|---|
Scale compute and storage independently. | Allows flexible resource allocation without over-provisioning. | Increase compute resources for analytics without expanding storage, and scale engines independently based on workload needs. |
Scalability and elasticity

Best practice for scalability and elasticity | Impact | How to implement |
---|---|---|
Use right-sized engines based on workload needs. | Optimizes performance and cost by allocating appropriate resources for each workload. | Start with small or medium engines for lighter workloads and scale up using ALTER ENGINE for high-concurrency or intensive tasks. |
Workload type | Recommended engine configuration |
---|---|
High-concurrency business intelligence (BI) | Use medium to large engines with high parallelism for supporting many concurrent queries efficiently. |
ETL and transformations | Use small to medium engines with moderate concurrency, optimized for data ingestion and transformations. |
Data science | Use small engines for iterative, ad-hoc queries or training datasets. |
Data-intensive applications | Use dedicated small engines with consistent SLA for low-latency queries. |
Ad-hoc analytics | Use on-demand engines that start quickly and scale based on query complexity. |
Workload isolation

- Firebolt engines support both read and write operations on shared data while maintaining strong consistency across engines, eliminating the need for manual synchronization.
- Firebolt optimizes workload execution dynamically based on configuration, resource utilization, and query history. This helps balance latency and throughput, ensuring that resources are used efficiently.
Best practice for workload isolation | Impact | How to implement |
---|---|---|
Use separate Firebolt engines for different workloads. | Prevents resource contention and improves workload isolation. | Assign dedicated engines for ingestion, analytics, and transformations to avoid competition for resources. |
Workload | Engine name | Configuration | Purpose |
---|---|---|---|
Data ingestion | ingestion_engine | TYPE = S, NODES = 1 | Continuous data loading. |
High-concurrency queries | analytics_engine | TYPE = M, NODES = 6 | Supports many concurrent queries. |
Real-time dashboards | dashboard_engine | TYPE = M, NODES = 2 | Low-latency SLA-sensitive queries. |
Ad-hoc analytics | ad_hoc_engine | TYPE = S, NODES = 2, AUTO_STOP | Cost-efficient for intermittent use. |
Cost model
Firebolt engines consume credits only when engines are running, so pausing unused engines can significantly reduce costs. In Redshift, charges are based on cluster uptime, even if it is idle. In Firebolt, you can pause an engine without losing access to stored data.
Best practice for scalability and elasticity | Impact | How to implement |
---|---|---|
Pause idle engines to reduce costs | Lowers operational costs while keeping data accessible. | Use auto-stop settings to pause unused engines and restart them on demand. |
Schema differences
Migrating from Amazon Redshift to Firebolt requires adapting your schema to leverage Firebolt’s indexing model, denormalization benefits, and optimized query execution. Unlike Redshift, which relies onSORT
and DIST
keys for performance tuning, Firebolt automatically optimizes queries using primary, and aggregating indexes. This eliminates the need for manual data distribution and sorting, reducing complexity while improving performance.
Additionally, semi-structured data handling differs. Redshift’s SUPER
type allows nested JSON storage, while Firebolt stores JSON as TEXT
, enabling flexible querying with JSON functions like JSON_VALUE
and JSON_EXTRACT
. Firebolt’s schema design also favors denormalization, reducing the need for complex joins and improving analytical query speed.

Feature | Redshift | Firebolt | Recommendations |
---|---|---|---|
Table design | Uses fact and dimension tables with explicit DISTKEY and SORTKEY properties for tuning. | Uses fact and dimension tables with indexing for optimized access. | Design fact tables for large datasets and dimension tables for reference data. |
Indexing | Uses SORTKEY and DISTKEY for performance tuning, requiring manual selection and optimization. | Uses primary and aggregating indexes instead. | Use primary indexes for efficient filtering and aggregating indexes for precomputed aggregations. |
Denormalization | Often normalized to reduce redundancy and optimize joins. | Encourages denormalization to reduce joins and improve query performance. | Store frequently joined reference columns in fact tables to minimize joins. |
Handling JSON | Supports SUPER data type for nested JSON storage, queried with JSON_EXTRACT_PATH_TEXT . | Stores JSON as TEXT and uses JSON_VALUE and JSON_POINTER_EXTRACT_TEXT for querying. | Store JSON data as TEXT and use JSON functions to extract values on demand. |
Data types | Supports a broad range of native types including SUPER . | Uses similar types, but JSON is stored as TEXT . | Convert SUPER columns to TEXT and adjust data types to match Firebolt-supported types. |
Table design

Best practice for table design | Impact | How to implement |
---|---|---|
Store frequently queried large datasets in fact tables. | Improves query speed and reduces scan times. | Create fact tables for large datasets used in analytical queries. |
Use dimension tables for reference data. | Simplifies lookups and schema organization. | Define dimension tables for static or lookup datasets. |
Define primary indexes on frequently filtered columns. | Enables efficient pruning and faster filtering. | Add primary indexes to columns commonly used in WHERE or JOIN conditions. |
orders
and customers
:
customer_id
primary index, scanning only relevant data:
Indexing

SORTKEY
and DISTKEY
to improve data distribution and query performance. The SORTKEY
controls the physical order of stored data, optimizing filtering and range-based queries. The DISTKEY
distributes data across cluster nodes to balance workloads. These keys require careful, manual tuning and must be set during table creation. Changing them requires rebuilding tables and reloading data, making schema adjustments rigid and time-consuming.
Poor DISTKEY
choices in Redshift can lead to data skew, where some nodes store more data than others, creating performance bottlenecks. Additionally, scaling Redshift clusters requires redistributing data based on the defined DISTKEY
. This process can result in downtime and degraded performance while workloads rebalance, and often forces additional manual optimization.
Firebolt eliminates manual distribution keys and sort keys by using dynamic indexing to optimize query processing. Indexes in Firebolt can be added, modified, or replaced without reloading data. Primary indexes physically organize data to speed up filtering and scanning during query runtime. Aggregating indexes precompute aggregate results, allowing faster retrieval without dynamic calculations.
Firebolt’s indexing model works with tablet-based storage, reducing the need for manual tuning tasks such as VACUUM
and ANALYZE
in Redshift. The decoupled compute-storage architecture allows dynamic data access without requiring distribution keys or physical data redistribution when scaling. Indexes enable the engine to prune irrelevant data and improve query performance automatically.
Understanding how indexes and tablet-based storage replace Redshift’s DISTKEY
and SORTKEY
is essential for successful schema migration. Firebolt’s indexing approach simplifies schema design, reduces maintenance overhead, and delivers faster query performance with greater flexibility.
Best practice for indexing | Impact | How to implement |
---|---|---|
Use a primary index instead of a SORTKEY . | Reduces scan times and improves query filtering. | Create primary indexes on columns used for filtering and range queries. |
Replace DISTKEY with an aggregating index. | Speeds up aggregations without manual data distribution. | Create aggregating indexes for precomputed sums or counts on frequently grouped columns. |
sales
table that distributes data by customer_id (DISTKEY)
for optimized joins, sorts rows by sale_date (SORTKEY)
for faster range queries, and defines a customers
table with customer_id
as the primary key for relational integrity. In Redshift, the primary key is informational only and not enforced at the database level:
DISTKEY(customer_id)
is replaced with the aggregating index(customer_id, SUM(total_amount))
. This aggregation runs faster than Redshift becauseSUM(total_amount)
is precomputed in the aggregating index.SORTKEY(sale_date)
is replaced with the primary index(sale_date)
.
DISTKEY
in Redshift partitions data across nodes by game_id
. Firebolt’s primary index on GameID
and PlayerID
allows the engine to quickly prune data during query processing, achieving similar optimization without manual distribution.
In Redshift, the following schema uses DISTKEY
and SORTKEY
for the playstats
table:
GameID
or PlayerID
so that the query scans over only this index:
Denormalization

Best practice for denormalization | Impact | How to implement |
---|---|---|
Denormalize frequently joined tables. | Reduces query complexity and improves performance. | Store commonly joined attributes directly in fact tables. |
Avoid excessive joins. | Speeds up analytical queries and reduces resource usage. | Move frequently accessed reference data into the same table. |
Balance redundancy and maintenance. | Prevents excessive data duplication while simplifying queries. | Only denormalize attributes used in frequent filters or aggregations. |
orders
and customers
tables as follows:
customer_name
directly in the orders
table:
customer_name
column is stored directly in the orders
table, eliminating the need for a join and reducing query complexity.
Handling JSON

TEXT
, allowing dynamic field extraction without predefined schemas. Firebolt includes specialized JSON functions to extract, convert, and manipulate JSON data stored in TEXT
columns, enabling flexible and efficient querying without predefined structures. Unlike Redshift’s SUPER
type, which requires fixed structures and functions like JSON_EXTRACT_PATH_TEXT
, Firebolt enables flexible querying with JSON_VALUE, JSON_EXTRACT, and JSON_POINTER_EXTRACT_TEXT. These functions retrieve only the necessary JSON fields, reducing scan time and improving performance.
Firebolt’s approach removes the need for manual schema updates, simplifies query logic, and supports dynamic key lookups without restructuring the data model. Queries are easier to write and maintain because Firebolt avoids the nested function calls required in Redshift. By replacing SUPER
with TEXT
, users gain a more efficient and flexible way to query JSON data.
Best practice for handling JSON | Impact | How to implement |
---|---|---|
Store JSON data as TEXT | Enables flexible querying without predefined schemas. | Export JSON columns from Redshift as TEXT and define them as TEXT in Firebolt. |
Use JSON_VALUE for simple field extraction | Simplifies queries and reduces scan time. | Extract single fields directly with JSON_VALUE . |
Use JSON_POINTER_EXTRACT_TEXT for nested fields | Allows access to deeply nested fields in JSON documents. | Use JSON_POINTER_EXTRACT_TEXT with JSON pointer expressions. |
Use dynamic path construction with CONCAT when needed | Supports querying dynamic keys without fixed paths. | Build JSON pointer paths using CONCAT for dynamic extraction. |
SUPER
type as follows:
TEXT
, by unloading to Amazon S3 in JSON or CSV format. Then, define the column in Firebolt as TEXT
as follows:
JSON_VALUE
as follows:
CONCAT
:
Data types

SUPER
for semi-structured data and fixed-length CHAR
or VARCHAR
columns, while Firebolt stores JSON as TEXT
and uses TEXT
for variable-length strings without length restrictions. Understanding these differences helps ensure consistent schema conversion and accurate query results when migrating.
Best practice for data types | Impact | How to implement |
---|---|---|
Use TEXT instead of fixed-length CHAR or VARCHAR | Avoids truncation issues and simplifies schema management. | Define string columns as TEXT without length constraints. |
Convert SUPER columns to TEXT | Enables flexible JSON handling. | Store JSON data as TEXT and query using JSON functions. |
Review numeric precision and scale | Prevents rounding or overflow errors. | Map NUMERIC(p, s) in Redshift to DECIMAL(p, s) in Firebolt. |
Validate date and timestamp usage | Ensures compatibility for temporal queries. | Use DATE and TIMESTAMP consistently, without time zones. |
Redshift type | Firebolt equivalent | Notes |
---|---|---|
SMALLINT | SMALLINT | Identical. |
INTEGER | INT or INTEGER | Firebolt uses INT for integer types. |
BIGINT | BIGINT | Identical. |
REAL | FLOAT4 | Alias mapping for single-precision floats. |
DOUBLE PRECISION | DOUBLE | Use DOUBLE for double-precision floats. |
NUMERIC(p, s) | DECIMAL(p, s) | Equivalent; Firebolt uses DECIMAL. |
BOOLEAN | BOOLEAN | Identical. |
CHAR(n) or VARCHAR(n) | TEXT | Firebolt uses TEXT with no length restrictions. |
DATE | DATE | Identical. |
TIMESTAMP | TIMESTAMP | Firebolt supports TIMESTAMP without time zone. |
JSON | TEXT | JSON is stored as TEXT and queried using JSON functions. |
ARRAY | ARRAY(TEXT) | Firebolt arrays require explicit type definitions. |
players
table with columns for player_id
, nickname
, registration date, and score
using specified data types:
players
dimension table with columns for PlayerID
, Nickname
, registration date, and Score
, and defines a primary index on PlayerID
to optimize filtering and lookups.:
Steps to migrate from Redshift to Firebolt
- Export data from Redshift
- Load data into Firebolt
- Translate queries into Firebolt
- Validation, testing, and optimization
- Automated migration
Export data from Redshift

-
Use Redshift’s
UNLOAD
command to export tables or query results to Amazon S3. The following code example unloads thesales
table data in CSV format to the specified Amazon S3 bucket: -
Format data for Firebolt as follows:
- Ensure that exported data is in CSV or Parquet format.
- Include headers in CSV files for clarity during loading.
- Use Parquet with Snappy compression for faster loading.
- Compress CSV files with gzip compression to reduce storage and speed up transfer.
- Use consistent delimiters and avoid special characters that may cause parsing errors.
- Validate that exported data types match the expected Firebolt schema.
-
Organize files in Amazon S3 for parallel loading.
- Store exported files in Amazon S3 folders using clear prefixes, for example:
s3://your-redshift-data/sales/2024/01/
. - Use multiple small-to-medium sized files of 100 MB to 1 GB each to maximize parallel loading in Firebolt.
- Follow consistent naming conventions including table names, export dates, or partitions.
- Clean up any incomplete or partial exports to avoid loading invalid data.
- Store exported files in Amazon S3 folders using clear prefixes, for example:
Load data into Firebolt

tournaments
table in Firebolt, mapping specific columns from the file to the table’s fields and including metadata such as the source file name and timestamp:
Translate queries into Firebolt

- Window functions – Learn how to translate window functions from Redshift to Firebolt and optimize them using Firebolt’s indexing features for improved performance.
- Aggregation queries – Discover how to handle aggregation queries, leveraging Firebolt’s aggregating indexes to precompute results and significantly speed up query performance.
- JSON data – Understand how to work with JSON in Firebolt using its specialized functions, providing more flexibility and efficiency compared to Redshift’s approach.
- Query plan insights – Gain insights into query plans and how Firebolt’s EXPLAIN (ANALYZE) command helps optimize queries through runtime metrics for optimal performance.
Translate window functions into Firebolt

ROW_NUMBER
, RANK
, and SUM
with the OVER
clause, but Firebolt can enhance performance through indexing, particularly primary indexes, which minimize the data that needs to be processed.
The following example shows how to calculate a running total of player scores by game in both Redshift and Firebolt, using the SUM
window function.
In Redshift, the query to calculate the running total of the currentscore
for each playerid
within each gamid
ordered by playerid
, using a window function:
PlayStats
table in Firebolt with columns for gameid
, playerid
, and currentscore
. It defines a primary index on gameid
and playerid
to optimize query performance, such as the running total calculation, by efficiently retrieving the relevant rows based on these columns:
Translate aggregation queries into Firebolt

SUM
, COUNT
, and AVG
, and the GROUP BY
clause. Firebolt, however, enhances performance by precomputing and storing aggregated results through aggregating indexes, which reduce the need for runtime calculations.
The following example shows how to calculate the total playtime by game in both Redshift and Firebolt using the SUM
aggregation function.
In Redshift, the following code calculates the total currentplaytime
for each gameid
in the PlayStats
table by grouping the data by gameid
and summing the currentplaytime
for each group:
gameid
, avoiding runtime aggregation.
The following code example creates an aggregating index on the PlayStats
table, precomputing the sum of currentplaytime
for each gameid
, which optimizes query performance by storing the aggregated results for faster retrieval:
JSON data in Firebolt

TEXT
columns. Unlike Redshift’s SUPER
data type, which requires predefined structures, Firebolt allows users to work with raw JSON data directly, offering significant advantages in both performance and flexibility.
Firebolt’s JSON functions are divided into three main categories:
- Extract functions – Used to retrieve specific parts of a JSON document.
- Convert functions – Used to convert JSON data into SQL-compatible types like
TEXT
orARRAY
. - Hybrid functions – Combine extraction and conversion in one step, especially useful for nested JSON elements.
- JSON_EXTRACT – Extracts a part of the JSON data as raw JSON.
- JSON_POINTER_EXTRACT_VALUES – Extracts values from a JSON document using a pointer to the desired location.
TEXT
or ARRAY
. Key convert functions include the following:
- JSON_VALUE – Extracts a JSON value and converts it to a SQL-compatible format like TEXT.
- JSON_VALUE_ARRAY – Converts a JSON array into an array of SQL-compatible values.
- JSON_POINTER_EXTRACT_KEYS – Extracts keys from a JSON document.
id
key from the nested metadata object in the event_data
JSON column of the events
table and returns it as user_id
:
JSON_POINTER_EXTRACT_TEXT
to extract values from nested JSON objects using JSON pointer expressions as follows. The following code example extracts the value of the id
key from the nested metadata object in the event_data
JSON column of the events table and returns it as user_id
:
event_data
JSON column in the events
table, accessing keys with the tilde and slash special characters by using escape sequences and returning them as key_with_tilde
and key_with_slash
:
Firebolt’s query plan insights

EXPLAIN
command provides a static query plan, showing the sequence of operations, estimated costs, and resource allocation. This plan lacks runtime performance metrics and detailed insights into data movement across nodes.
In Redshift, the following code example analyzes a join query:
- The logical plan outlines the structure of the query, including projections, filters, and joins.
- The physical plan reveals how these operations are distributed across nodes, with details on data shuffling and execution strategies.
EXPLAIN
command adds runtime metrics, such as CPU time, thread time, rows processed, and runtime for each operation, allowing for detailed performance analysis.
In Firebolt, the following code example analyzes the same query and provides runtime metrics:
Validation, testing, and optimization

Validate data consistency

playstats
table in both Redshift and Firebolt:
GROUP BY
in both Redshift and Firebolt:
NULL
values, large datasets, and complex joins. This ensures that data migration doesn’t introduce errors in rare or complex scenarios.
The following code example shows complex joins from multiple tables in Redshift and Firebolt:
NULL
values are handled in both Redshift and Firebolt:
UNION ALL
to increase the size of the playstats
table in Redshift and measures the runtime and resource usage:
Performance testing

EXPLAIN
and STV_RECENTS
to evaluate query performance and resource consumption.
The following code example measures query runtime, resource usage, and queue time in Firebolt. Rerun it using different cluster sizes:
Engine optimization

Automate migration

Automate data migration with Airflow

aws_access_key_id
, aws_secret_access_key
, and s3_redshift_data_bucket
variables for your migration tasks. For more information, see the Apache Airflow documentation on variables.
Use Airflow to run Redshift’s UNLOAD
command on a scheduled basis to export data from Redshift tables to Amazon S3 as follows:
Automate data migration with dbt

SORTKEY
and DISTKEY
into Firebolt’s primary and aggregating indexes. Create dbt models to define Firebolt-compatible schemas and apply them automatically.
The following code example uses dbt to create a Firebolt schema:
SUPER
to TEXT
for JSON handling as follows: