RECOMMEND DDL
CALL recommend_ddl
can help you optimize schema configurations to enhance query performance by early data pruning. The statement finds primary indexes (PIs) and parition key (PartK) recommendations for the specified table tailored to the given workload.
Syntax
CALL recommend_ddl(<table_name>, (<select_statement>))
Parameters
Parameter | Description |
---|---|
<table_name> | The name of the table for which primary indexes and partition keys should be recommended. |
<select_statement> | SELECT statement that returns the workload that the DDL recommendation is based on. The <select_statement> must return exactly one column of type TEXT . |
Example
The example below demonstrates retrieving schema recommendation using the CALL recommend_ddl
statement for a table named lineitem
tailored to the workload in the query history of the past week.
CALL recommend_ddl(lineitem, (SELECT query_text FROM information_schema.engine_query_history WHERE start_time > NOW() - INTERVAL '1 week'))
The <select_statement>
returns exactly one column of type TEXT
containing the SQL statements that the CALL recommend_ddl
command should analyze.
Returns:
recommended_partition_key | recommended_primary_index | average_pruning_improvement | analyzed_queries |
---|---|---|---|
DATE_TRUNC(‘month’, l_orderdate) | l_shipmode, l_returnflag, l_shipinstruct | 0.42 | 393 |
The CALL recommend_ddl
results indicate that the amount of bytes scanned can be decreased by up to 42% by configuring PRIMARY INDEX l_shipmode, l_returnflag, l_shipinstruct
and PARTITION BY DATE_TRUNC('month', l_orderdate)
. The statement analyzed 393 queries that scanned the lineitem
table and applied filters to any of the lineitem
columns.
Quick Setup
The following steps will guide you to achieve great query performance within the first few minutes after joining Firebolt. First, create a table without any primary index and parition key configurations.
CREATE TABLE <table_name>(
...
);
Next, load a workload that you want to run on this table from S3 into Firebolt utilizing COPY INTO.
COPY INTO workload_table FROM 's3://bucket/workload/' with ...
Now you can use the CALL recommend_ddl
command to find primary index and parition key configurations.
CALL recommend_ddl(<table_name>, (select * from workload_table));
Finally, recreate the table with the recommended primary index and parition key configurations and ingest the data into this table.
DROP TABLE <table_name>;
CREATE TABLE <table_name>(
...
)
PRIMARY INDEX ...
PARTITION BY ..;
Under The Hood
Primary index and partition key configurations are chosen to maximize pruning potential and thus reduce query runtime time. Columns with selective filters and low cardinality are suggested as primary index and partition key columns. Recommendations can be run on empty tables as well as on tables with production data. The more queries executed on a populated table, the better the recommendations become. If additionally the workload of a table changes over time, it can be beneficial to run the CALL recommend_ddl
command periodically to check for better table configurations.