Background: What is partial aggregation?

Partial aggregation is an optimization to improve the performance of group by queries on multi-node engines. When partial aggregation is enabled, each node performs aggregation on its local data before the results are shuffled and merged across nodes. Therefore, it is only applicable if the aggregate function is both associative and commutative. Examples of suitable aggregate functions are sum and avg, while median is not.
Partial aggregation is only relevant for multi-node engines and for queries that include a group by clause with suitable aggregate functions.

Advantages and disadvantages

Partial aggregation can reduce the amount of data that needs to be transferred between nodes during the shuffle phase. In turn, this can significantly improve query performance. Additionally, it helps to balance the workload across nodes, particularly in situations where the data is highly skewed. For example, if one group contains significantly more rows than others, partial aggregation enables several nodes to work on it in parallel. However, there are also disadvantages to using partial aggregation. When all groups are small and unlikely to be located on the same node, the partial aggregation iterates over the input data without actually reducing the number of rows to be shuffled. This scenario often occurs when there are many groups, each containing only a few rows. In this case, partial aggregation reduces the query performance.

Controlling partial aggregation in Firebolt

By default, Firebolt’s query planner always applies partial aggregation whenever the aggregate function is associative and commutative. However, there are scenarios where you may want to overwrite this and disable partial aggregation manually. The no_partial_agg hint gives you explicit control over the planner’s behavior. When you use this hint, partial aggregation is disabled for the query, and the planner will generate a plan that skips the local aggregation step. In the example below, you can see that without the hint, the plan includes AggregateState, Shuffle, and AggregateMerge operators. With the no_partial_agg hint, the plan only includes Shuffle and Aggregate.
explain(physical)
select
  AVG(quantity),
  location_id
from
  fact_sales
group by
  location_id;