Syntax
To enable inferring column statistics from existing aggregating indexes, use the query-level settingautomated_column_statistics_read_user_created_indexes.
The default value is false, so this is an opt-in feature.
Alternatives
You can also let Firebolt manage the aggregating indexes for you. This has the advantage that you do not have to manually change the query settings and always profit when using a table that has the feature enabled. See Create tables with statistics.Example
Consider two tables with columns that are frequently used for grouping, filtering, or joining.SET automated_column_statistics_read_user_created_indexes = true;.
Then you can see the following effect on the query plan:
departments.dept_name, departments.location, employees.gender, employees.title, and employees.dept_id.
These statistics help with further estimation.
Here are some key observations:
- The filter
[6] [Filter] (employees.title = 'Senior Developer')is estimated to produce 6 rows because ACS provides an appropriate distinct count of 27 foremployees.title. This estimation can be far off if data is skewed, but it often improves significantly over default estimations. - The filter
[9] [Filter] (departments.location = 'New York')is estimated to produce only 3 rows because ACS provides an appropriate distinct count of 3 fordepartments.location. This estimation can be far off if data is skewed, but it often improves significantly over default estimations. - The join
[4] [Join] Mode: Inner [(employees.dept_id = departments.dept_id)]is estimated to produce 5 rows. The query planner is able to infer from the distinct count statistics ondepartments.dept_idthat this column is unique (and likely a primary key). Consequently, the join cannot produce more rows than are coming from the foreign key side. Because some rows fromdepartmentshave been filtered out, the planner cannot guarantee that every row from the foreign key side will find a join partner and be produced. It performs a conservative estimation that one row will not be produced and the other 5 rows will be produced. - The aggregation
[2] [Aggregate] GroupBy: [employees.gender, departments.dept_name] Aggregates: [count_0: count(*), avg2_0: avg2(employees.salary)]is estimated to produce 5 groups. While ACS suggests 2 distinct genders and 3 distinct department names, the product2 * 3 = 6is bounded by the expected row count of 5. Hence, the planner predicts 5 groups.