optimizer_mode
session parameter to 'user_guided'
.
In this mode, the optimizer will exclude all cost-based optimization rules.
This ensures that the relative order of joins and aggregations in the resulting execution plan will follow the sytnatic order defined in the SELECT
blocks of your SQL query.
As a more granular setting, you can also use the no_join_ordering
query hint.
Specifically, the following optimizations are disabled in user-guided mode:
- Join ordering
- Aggregate push-down and placement
- Subquery decorrelation
- Common sub-plan discovery
- Redundant join removal
- Transforming join conditions to match the best execution strategy
store_sales
) and two dimension tables (item
and date_dim
).
Both dimension tables have local predicates (item.i_manufact_id = 128
and date_dim.d_year
).
When the optimizer_mode
is 'automatic'
, the join order depends on the estimated number of rows in the two dimension tables:
since the Filter
on top item
produces a smaller intermediate result, the optimizer decides to join it with store_sales
first,
and then join this intermediate result with date_dim
.
However, if you believe that the better join order is to first join with date_dim
and then with item
, you can:
(1) set optimizer_mode = 'user_guided'
, and
(2) put the join date_dim on (...)
before join item on (...)
.
Understanding join ordering in user-guided mode
When usinguser_guided
mode, the syntactic order of joins in your SQL query directly determines how Firebolt executes them.
Understanding this behavior is crucial for optimal performance.
Build and probe side semantics
In user-guided mode, Firebolt implements joins using hash join algorithms where:- The right input of each join becomes the build side - Firebolt creates a hash table from this data.
- The left input of each join becomes the probe side - Firebolt scans this data and probes the hash table.
- Build a hash table from
small_table
(the left input). - Scan
large_table
(the right input) and probe the hash table for matches.
Ordering joins for optimal performance
For optimal performance, place smaller tables on the right side of joins. This minimizes memory usage and reduces the overhead of building hash tables, leading to faster query execution. Consider this example with a large fact table and smaller dimension tables:- Reduced memory usage - Smaller hash tables require less RAM.
- Faster hash table construction - Less time spent building the hash table.
- Improved probe performance - Smaller hash tables fit better in CPU caches, causing fewer cache misses when probing.
- More subresult reuse potential - Firebolt can reuse hash tables across queries. Smaller hash tables allow more subresults to be cached. Learn more in our subresult reuse documentation and our blog post Caching & Reuse of Subresults across Queries.