Learn how autoscaling works in Firebolt engines.
MIN_CLUSTERS
and MAX_CLUSTERS
options in CREATE ENGINE
and ALTER ENGINE commands. When demand rises above the current
capacity, Firebolt adds one new cluster; when load drops, it removes one.
After each adjustment the service observes a short stabilization window before it evaluates the next scale decision,
preventing thrashing during bursty traffic.
When a stopped engine starts, it starts with the minimum number of clusters.
Metric | Thresholds for adding clusters(any) | Thresholds for removing clusters (all) |
---|---|---|
CPU utilization | over 90% | under 75% |
RAM utilization | over 70% | under 50% |
Max query queue time | over 3 seconds | under 1 second |
TYPE
or more NODES
) instead.
For general sizing advice, see the Sizing Engines guide.
During scale down, the engine will wait for all queries that run on a cluster that is shutting down to finish before removing it (i.e. graceful drain).
During scale up, an added cluster will start “cold” (i.e. it will not have any data cached).
Because of this, Firebolt might route less traffic to the new cluster until it has warmed up to prevent overloading it.
We are working on a feature that will make a new cluster proactively fetch data to be in sync with the cache of the other clusters.