Auto-stop
Auto-stop is opt-in viaspec.autoStop.enabled=true. When enabled, the Firebolt Operator owns spec.replicas and drives it between two fixed levels. It scales the engine up to activeReplicas when a UTC schedule window is open or a wake-up is requested, and scales it down to idleReplicas (default 0, which fully stops the engine) after the engine sits idle for idleTimeout. This is an activity-gated on/off toggle, not proportional autoscaling. The engine runs at either idleReplicas or activeReplicas, never a value in between, and query volume never changes the count. Observed query activity only keeps the engine warm by resetting the idle clock. It never adds replicas.
Auto-stop reuses the same Prometheus signal the drain check consumes: firebolt_running_queries + firebolt_suspended_queries, summed across all running pods of the active generation. Sharing the signal keeps βthe engine is busyβ in exactly one place. A pod that the drain check would refuse to evict is the same pod auto-stop counts as activity.
Decision precedence
computeAutoStopDecision is a pure function over (spec, status, observation, now). Precedence, top-down:
- Disabled: If
spec.autoStopis unset orenabled=false, auto-stop emits no decision andspec.replicasis fully user-owned. - Wake requested:
metadata.annotations["firebolt.io/wake-requested"]carries an RFC 3339 timestamp younger thanDefaultAutoStopWakeTTL(5 minutes). Replicas are scaled toactiveReplicas. ReasonWakeRequested. See Gateway wake-up protocol. - Schedule active:
nowfalls inside any window inspec.autoStop.schedule. Replicas are pinned atactiveReplicas. Schedule wins over both idle and stopped paths so an βalways-on during business hoursβ policy can wake a parked engine. - Stopped:
spec.replicas == 0, no fresh wake annotation, no schedule window active. No-op. - Scrape failed or activity observed: Refresh
status.lastActivityTime, do not scale. Scrape failures are grouped with activity intentionally because a broken probe must never look quiet enough to scale down. - Quiet >= idleTimeout, replicas > idleReplicas: Patch
spec.replicas = idleReplicasand stampstatus.lastScaledAt. - First quiet observation:
status.lastActivityTimeis anchored tonowso a fresh engine gets one fullidleTimeoutof grace.
Level-driven encoding
Scale events are encoded by patchingspec.replicas via the standard r.Update. The FireboltEngine watch fires. The next reconcile takes the normal blue-green path through creating, switching, draining, and cleaning. Auto-stop runs only in terminal phases (stable/stopped) so it cannot fight a rollout in progress.
Because scale-down only fires when firebolt_running_queries + firebolt_suspended_queries == 0 was just observed, the subsequent drain check on the old generation completes immediately. There is no wasted grace period. A separate βskip drain because auto-stop vouched for itβ path is unnecessary.
Configuration
| Field | Default | Description |
|---|---|---|
spec.autoStop.enabled | false | Master toggle. |
spec.autoStop.activeReplicas | (required) | Replica count when active. |
spec.autoStop.idleReplicas | 0 | Floor. 0 fully stops the engine. |
spec.autoStop.idleTimeout | 30m | Quiet window before scaling to idleReplicas. |
spec.autoStop.pollInterval | 1m | Scrape cadence. |
spec.autoStop.schedule[] | [] | UTC HH:MM-HH:MM always-on windows. Optional days filter (Mon..Sun). End < start crosses midnight. |
Status fields
| Field | Meaning |
|---|---|
status.lastActivityTime | Most recent observation that recorded activity (or, for a fresh engine, the first quiet observation). Drives the idle clock. |
status.lastScaledAt | Timestamp of the most recent auto-stop-driven spec.replicas mutation. Distinguishes auto-stop scale events from user edits. |
status.autoStopReason | Token: Disabled / WakeRequested / ScheduleActive / Stopped / Initializing / ActivityObserved / ScrapeFailed / Idle. |
Known limitations
Auto-stop samples query activity once perpollInterval (default 1 minute), which leads to a few current edge cases:
- Brief queries can be missed. A query that runs for less than
pollIntervalmay not be counted as activity, so it does not reset the idle timer. A steady stream of very short queries can fail to keep an engine awake. - An engine can stop earlier than expected. An engine can scale down up to one
pollIntervalsooner than its configuredidleTimeout.
Gateway wake-up protocol
A FireboltEngine that auto-stop has scaled to zero replicas needs a way to come back to life when a query arrives. The Firebolt Operator and the Envoy-based gateway exchange a single, level-driven signal for this:Why an annotation
- Level-driven: The annotation timestamp is part of the resource state. Any reconcile (Firebolt Operator restart, periodic poll, watch event) reads the same value and converges identically.
- Fire-and-forget: a wake-up does not require a synchronous response. The gateway buffers the triggering query locally and retries against engine DNS once Envoyβs active health checks observe a ready upstream.
- Coalescing: 1000 simultaneous queries for the same stopped engine produce a handful of identical patches via K8s optimistic concurrency, not a thundering-herd RPC.
- Firebolt Operator-down tolerant: K8s API still accepts the patch when the Firebolt Operator manager is restarting. The next reconciler instance picks it up.
Wake annotation TTL
DefaultAutoStopWakeTTL (5 minutes) bounds how long an unrefreshed annotation continues to trigger scale-up. Long enough to cover engine cold-start (image pull, blue-green creating phase) and short enough that an abandoned wake does not pin an engine after the gateway has given up. The gateway is expected to keep stamping the annotation while it has buffered queries waiting.
The annotation is honored only when spec.autoStop.enabled=true. Without an auto-stop policy the Firebolt Operator has no activeReplicas to scale to, and respecting a wake from a non-policy actor would silently override the userβs spec.replicas==0 intent.
Gateway RBAC
Each FireboltInstance now provisions per-gateway RBAC alongside the gateway Deployment:| Resource | Name | Purpose |
|---|---|---|
ServiceAccount | <instance>-gateway | Identity attached to gateway pods. |
Role | <instance>-gateway-wake | Grants get, list, patch on fireboltengines in the same namespace. |
RoleBinding | <instance>-gateway-wake | Binds the SA to the Role. |
metadata.annotations[firebolt.io/wake-requested]. Misuse beyond that is reviewed via Kubernetes audit logs, not enforced by RBAC.
Note: The gateway-side implementation (intercept routing for stopped engines, buffer the request, patch the annotation, retry against engine DNS) is not yet wired in. It would extend the Envoy Lua filter rendered by the Firebolt Operator in internal/controller/instance_gateway.go. This document defines the contract and the Firebolt Operator-side enforcement (RBAC, fresh-annotation handling). The Envoy-side hook is tracked as a follow-up.