Background: What is partial aggregation?
Partial aggregation is an optimization to improve the performance ofgroup by
queries on multi-node engines.
When partial aggregation is enabled, each node performs aggregation on its local data before the results are shuffled and merged across nodes.
Therefore, it is only applicable if the aggregate function is both associative and commutative.
Examples of suitable aggregate functions are sum
and avg
, while median
is not.
Partial aggregation is only relevant for multi-node engines and for queries that include a
group by
clause with suitable aggregate functions.Advantages and disadvantages
Partial aggregation can reduce the amount of data that needs to be transferred between nodes during the shuffle phase. In turn, this can significantly improve query performance. Additionally, it helps to balance the workload across nodes, particularly in situations where the data is highly skewed. For example, if one group contains significantly more rows than others, partial aggregation enables several nodes to work on it in parallel. However, there are also disadvantages to using partial aggregation. When all groups are small and unlikely to be located on the same node, the partial aggregation iterates over the input data without actually reducing the number of rows to be shuffled. This scenario often occurs when there are many groups, each containing only a few rows. In this case, partial aggregation reduces the query performance.Controlling partial aggregation in Firebolt
By default, Firebolt’s query planner always applies partial aggregation whenever the aggregate function is associative and commutative. However, there are scenarios where you may want to overwrite this and disable partial aggregation manually. Theno_partial_agg
hint gives you explicit control over the planner’s behavior.
When you use this hint, partial aggregation is disabled for the query, and the planner will generate a plan that skips the local aggregation step.
In the example below, you can see that without the hint, the plan includes AggregateState
, Shuffle
, and AggregateMerge
operators.
With the no_partial_agg
hint, the plan only includes Shuffle
and Aggregate
.