Understanding Autoscaling
Learn how autoscaling works in Firebolt engines.
This is a technical guide to understanding how concurrency autoscaling works in Firebolt engines. For overview of autoscaling, see the Concurrency auto-scaling.
Basic operation
Firebolt engines can scale for concurrency by changing the number of clusters that belong to a single engine.
You set the bounds with the MIN_CLUSTERS
and MAX_CLUSTERS
options in CREATE ENGINE
and ALTER ENGINE commands. When demand rises above the current
capacity, Firebolt adds one new cluster; when load drops, it removes one.
After each adjustment the service observes a short stabilization window before it evaluates the next scale decision,
preventing thrashing during bursty traffic.
When a stopped engine starts, it starts with the minimum number of clusters.
Auto-scaling metrics
The autoscaler bases its decisions on three workload signals:
- CPU utilization – averaged across nodes in each cluster, then averaged across clusters
- RAM utilization – averaged in the same way
- Query queue time – the maximum time any waiting query in the engine has spent in the queue
The exact thresholds and the way metrics are aggregated are tuned by Firebolt and can be changed without notice. Current default thresholds are:
Metric | Thresholds for adding clusters(any) | Thresholds for removing clusters (all) |
---|---|---|
CPU utilization | over 90% | under 75% |
RAM utilization | over 70% | under 50% |
Max query queue time | over 3 seconds | under 1 second |
The current default stabilization window is 1 minute.
For monitoring engine stats, please refer to Monitoring Engine Usage.
Monitoring autoscaling
You can check how many clusters an engine is using at any moment — and what its minimum and maximum limits are — via information_schema.engines table.
You can see how an engine changed the number of clusters over time via information_schema.engine_history table.
Gotchas
This is concurrency auto-scaling: it creates extra clusters so that new queries can start promptly.
It does not re-plan or speed up queries that were already running.
Adding more clusters will not make a single large query finish faster.
If an individual query needs more processing power, consider scaling the engine up or out (larger TYPE
or more NODES
) instead.
For general sizing advice, see the Sizing Engines guide.
During scale down, the engine will wait for all queries that run on a cluster that is shutting down to finish before removing it (i.e. graceful drain).
During scale up, an added cluster will start “cold” (i.e. it will not have any data cached). Because of this, Firebolt might route less traffic to the new cluster until it has warmed up to prevent overloading it. We are working on a feature that will make a new cluster proactively fetch data to be in sync with the cache of the other clusters.