> ## Documentation Index
> Fetch the complete documentation index at: https://docs.firebolt.io/llms.txt
> Use this file to discover all available pages before exploring further.

<AgentInstructions>

## Submitting Feedback

If you encounter incorrect, outdated, or confusing documentation on this page, submit feedback:

POST https://docs.firebolt.io/feedback

```json
{
  "path": "/guides/operate-engines/understand-autoscaling",
  "feedback": "Description of the issue"
}
```

Only submit feedback when you have something specific and actionable to report.

</AgentInstructions>

> Learn how autoscaling works in Firebolt engines.

# Understanding Autoscaling

This is a technical guide to understanding how concurrency autoscaling works in Firebolt engines.
For overview of autoscaling, see the [Concurrency auto-scaling](/guides/operate-engines/working-with-engines-using-ddl#concurrency-auto-scaling).

## Basic operation

Firebolt engines can scale for concurrency by changing the number of clusters that belong to a single engine.
You set the bounds with the `MIN_CLUSTERS` and `MAX_CLUSTERS` options in [CREATE ENGINE](/reference-sql/commands/engines/create-engine)
and [ALTER ENGINE](/reference-sql/commands/engines/alter-engine) commands. When demand rises above the current
capacity, Firebolt adds one new cluster; when load drops, it removes one.
After each adjustment the service observes a short stabilization window before it evaluates the next scale decision,
preventing thrashing during bursty traffic.

When a stopped engine starts, it starts with the minimum number of clusters.

## Auto-scaling metrics

The autoscaler bases its decisions on three workload signals:

* **CPU utilization** – averaged across nodes in each cluster, then averaged across clusters
* **RAM utilization** – averaged in the same way
* **Query queue time** – the maximum time any waiting query in the engine has spent in the queue

The exact thresholds and the way metrics are aggregated are tuned by Firebolt and can be changed without notice. Current default thresholds are:

| Metric               | Thresholds for adding clusters(any) | Thresholds for removing clusters (all) |
| -------------------- | ----------------------------------- | -------------------------------------- |
| CPU utilization      | over 90%                            | under 75%                              |
| RAM utilization      | over 70%                            | under 50%                              |
| Max query queue time | over 3 seconds                      | under 1 second                         |

The current default stabilization window **is 1 minute**.

For monitoring engine stats, please refer to [Monitoring Engine Usage](/overview/engine-fundamentals#monitoring-engine-usage).

## Monitoring autoscaling

You can check how many clusters an engine is using at any moment — and what its minimum and maximum limits are —
via [information\_schema.engines](/reference-sql/information-schema/engines) table.

You can see how an engine changed the number of clusters over time via
[information\_schema.engine\_history](/reference-sql/information-schema/engine-history) table.

## Gotchas

This is **concurrency auto-scaling**: it creates extra clusters so that new queries can start promptly.
It does **not** re-plan or speed up queries that were already running.
Adding more clusters will not make a single large query finish faster.
If an individual query needs more processing power, consider scaling the engine **up or out** (larger `TYPE` or more `NODES`) instead.
For general sizing advice, see the [Sizing Engines guide](/guides/operate-engines/sizing-engines).

During scale down, the engine will wait for all queries that run on a cluster that is shutting down to finish before removing it (i.e. graceful drain).

During scale up, an added cluster will start "cold" (i.e. it will not have any data cached).
Because of this, Firebolt might route less traffic to the new cluster until it has warmed up to prevent overloading it.
We are working on a feature that will make a new cluster proactively fetch data to be in sync with the cache of the other clusters.
