> ## Documentation Index
> Fetch the complete documentation index at: https://docs.firebolt.io/llms.txt
> Use this file to discover all available pages before exploring further.

<AgentInstructions>

## Submitting Feedback

If you encounter incorrect, outdated, or confusing documentation on this page, submit feedback:

POST https://docs.firebolt.io/feedback

```json
{
  "path": "/performance-and-observability/query-planning/query-hints/no-partial-aggregation",
  "feedback": "Description of the issue"
}
```

Only submit feedback when you have something specific and actionable to report.

</AgentInstructions>

> You can disable partial aggregation using a query hint comment.

# No partial aggregation

## Background: What is partial aggregation?

Partial aggregation is an optimization to improve the performance of `group by` queries on multi-node engines.
When partial aggregation is enabled, each node performs aggregation on its local data before the results are shuffled and merged across nodes.
Therefore, it is only applicable if the aggregate function is both associative and commutative.
Examples of suitable aggregate functions are `sum` and `avg`, while `median` is not.

<Info>
  Partial aggregation is only relevant for multi-node engines and for queries that include a `group by` clause with suitable aggregate functions.
</Info>

## Advantages and disadvantages

Partial aggregation can reduce the amount of data that needs to be transferred between nodes during the shuffle phase.
In turn, this can significantly improve query performance.
Additionally, it helps to balance the workload across nodes, particularly in situations where the data is highly skewed.
For example, if one group contains significantly more rows than others, partial aggregation enables several nodes to work on it in parallel.

However, there are also disadvantages to using partial aggregation.
When all groups are small and unlikely to be located on the same node, the partial aggregation iterates over the input data without actually reducing the number of rows to be shuffled.
This scenario often occurs when there are many groups, each containing only a few rows.
In this case, partial aggregation reduces the query performance.

## Controlling partial aggregation in Firebolt

By default, Firebolt's query planner always applies partial aggregation whenever the aggregate function is associative and commutative.
However, there are scenarios where you may want to overwrite this and disable partial aggregation manually.

The `no_partial_agg` hint gives you explicit control over the planner's behavior.
When you use this hint, partial aggregation is disabled for the query, and the planner will generate a plan that skips the local aggregation step.
In the example below, you can see that without the hint, the plan includes `AggregateState`, `Shuffle`, and `AggregateMerge` operators.
With the `no_partial_agg` hint, the plan only includes `Shuffle` and `Aggregate`.

<CodeGroup>
  ```sql SQL source (with hint, no partial aggregation) {2} theme={"theme":{"light":"github-light","dark":"github-dark"}}
  explain(physical)
  select /*! no_partial_agg */
    AVG(quantity),
    location_id
  from
    fact_sales
  group by
    location_id;
  ```

  ```text Plan (with hint, no partial aggregation) {5, 7} theme={"theme":{"light":"github-light","dark":"github-dark"}}
  [0] [MaybeCache]
   \_[1] [Shuffle] Gather
     |   [Affinity]: many nodes
      \_[2] [Projection] avg_0, fact_sales.location_id
         \_[3] [Aggregate] GroupBy: [fact_sales.location_id] Aggregates: [avg_0: avg(fact_sales.quantity)]
           |   [Types]: avg_0: double precision null
            \_[4] [Shuffle] Hash by [fact_sales.location_id]
              |   [Affinity]: many nodes
               \_[5] [TableFuncScan] fact_sales.location_id: $0.location_id, fact_sales.quantity: $0.quantity
                 |   $0 = read_tablets(table_name => fact_sales, tablet)
                 |   [Types]: fact_sales.location_id: text null, fact_sales.quantity: integer null
                  \_[6] [TableFuncScan] tablet: $0.tablet
                        $0 = list_tablets(table_name => fact_sales)
                        [Types]: tablet: tablet not null

  ```

  ```sql SQL source (default, with partial aggregation) {2} theme={"theme":{"light":"github-light","dark":"github-dark"}}
  explain(physical)
  select
    AVG(quantity),
    location_id
  from
    fact_sales
  group by
    location_id;
  ```

  ```text Plan (default, with partial aggregation) {5, 7, 9} theme={"theme":{"light":"github-light","dark":"github-dark"}}
  [0] [MaybeCache]
   \_[1] [Shuffle] Gather
     |   [Affinity]: many nodes
      \_[2] [Projection] avg_1, fact_sales.location_id
         \_[3] [AggregateMerge] GroupBy: [fact_sales.location_id] Aggregates: [avg_1: avgmerge(avg_0)]
           |   [Types]: avg_1: double precision null
            \_[4] [Shuffle] Hash by [fact_sales.location_id]
              |   [Affinity]: many nodes
               \_[5] [AggregateState partial] GroupBy: [fact_sales.location_id] Aggregates: [avg_0: avg(fact_sales.quantity)]
                 |   [Types]: avg_0: aggregatefunction(avg2ornull, integer null) not null
                  \_[6] [TableFuncScan] fact_sales.location_id: $0.location_id, fact_sales.quantity: $0.quantity
                    |   $0 = read_tablets(table_name => fact_sales, tablet)
                    |   [Types]: fact_sales.location_id: text null, fact_sales.quantity: integer null
                     \_[7] [TableFuncScan] tablet: $0.tablet
                           $0 = list_tablets(table_name => fact_sales)
                           [Types]: tablet: tablet not null
  ```
</CodeGroup>

You can also specify the hint after the `group by` keyword.

```sql SQL source {8} theme={"theme":{"light":"github-light","dark":"github-dark"}}
explain(physical)
select
  AVG(quantity),
  location_id
from
  fact_sales
group by
  /*! no_partial_agg */
  location_id;
```
