Auto-stop and wake-up

Auto-stop

Auto-stop is opt-in via spec.autoStop.enabled=true. When enabled, the Firebolt Operator owns spec.replicas and drives it between two fixed levels. It scales the engine up to activeReplicas when a UTC schedule window is open or a wake-up is requested, and scales it down to idleReplicas (default 0, which fully stops the engine) after the engine sits idle for idleTimeout. This is an activity-gated on/off toggle, not proportional autoscaling. The engine runs at either idleReplicas or activeReplicas, never a value in between, and query volume never changes the count. Observed query activity only keeps the engine warm by resetting the idle clock. It never adds replicas. Auto-stop reuses the same Prometheus signal the drain check consumes: firebolt_running_queries + firebolt_suspended_queries, summed across all running pods of the active generation. Sharing the signal keeps “the engine is busy” in exactly one place. A pod that the drain check would refuse to evict is the same pod auto-stop counts as activity.

Decision precedence

computeAutoStopDecision is a pure function over (spec, status, observation, now). Precedence, top-down:

Disabled: If spec.autoStop is unset or enabled=false, auto-stop emits no decision and spec.replicas is fully user-owned.
Wake requested: metadata.annotations["firebolt.io/wake-requested"] carries an RFC 3339 timestamp younger than DefaultAutoStopWakeTTL (5 minutes). Replicas are scaled to activeReplicas. Reason WakeRequested. See Gateway wake-up protocol.
Schedule active: now falls inside any window in spec.autoStop.schedule. Replicas are pinned at activeReplicas. Schedule wins over both idle and stopped paths so an “always-on during business hours” policy can wake a parked engine.
Stopped: spec.replicas == 0, no fresh wake annotation, no schedule window active. No-op.
Scrape failed or activity observed: Refresh status.lastActivityTime, do not scale. Scrape failures are grouped with activity intentionally because a broken probe must never look quiet enough to scale down.
Quiet >= idleTimeout, replicas > idleReplicas: Patch spec.replicas = idleReplicas and stamp status.lastScaledAt.
First quiet observation: status.lastActivityTime is anchored to now so a fresh engine gets one full idleTimeout of grace.

Level-driven encoding

Scale events are encoded by patching spec.replicas via the standard r.Update. The FireboltEngine watch fires. The next reconcile takes the normal blue-green path through creating, switching, draining, and cleaning. Auto-stop runs only in terminal phases (stable/stopped) so it cannot fight a rollout in progress. Because scale-down only fires when firebolt_running_queries + firebolt_suspended_queries == 0 was just observed, the subsequent drain check on the old generation completes immediately. There is no wasted grace period. A separate “skip drain because auto-stop vouched for it” path is unnecessary.

Configuration

Field	Default	Description
`spec.autoStop.enabled`	`false`	Master toggle.
`spec.autoStop.activeReplicas`	(required)	Replica count when active.
`spec.autoStop.idleReplicas`	`0`	Floor. `0` fully stops the engine.
`spec.autoStop.idleTimeout`	`30m`	Quiet window before scaling to `idleReplicas`.
`spec.autoStop.pollInterval`	`1m`	Scrape cadence.
`spec.autoStop.schedule[]`	`[]`	UTC `HH:MM`-`HH:MM` always-on windows. Optional `days` filter (`Mon`..`Sun`). End < start crosses midnight.

Status fields

Field	Meaning
`status.lastActivityTime`	Most recent observation that recorded activity (or, for a fresh engine, the first quiet observation). Drives the idle clock.
`status.lastScaledAt`	Timestamp of the most recent auto-stop-driven `spec.replicas` mutation. Distinguishes auto-stop scale events from user edits.
`status.autoStopReason`	Token: `Disabled` / `WakeRequested` / `ScheduleActive` / `Stopped` / `Initializing` / `ActivityObserved` / `ScrapeFailed` / `Idle`.

Known limitations

Auto-stop samples query activity once per pollInterval (default 1 minute), which leads to a few current edge cases:

Brief queries can be missed. A query that runs for less than pollInterval may not be counted as activity, so it does not reset the idle timer. A steady stream of very short queries can fail to keep an engine awake.
An engine can stop earlier than expected. An engine can scale down up to one pollInterval sooner than its configured idleTimeout.

Gateway wake-up protocol

A FireboltEngine that auto-stop has scaled to zero replicas needs a way to come back to life when a query arrives. The Firebolt Operator and the Envoy-based gateway exchange a single, level-driven signal for this:

                       patch annotation
                ┌─────────────────────────────┐
   ┌──────────┐ │   metadata.annotations      │  ┌────────────┐
   │ Gateway  ├─►   firebolt.io/wake-requested├─►│ Engine CR  │
   │ (Envoy)  │ │   = "<RFC 3339 timestamp>"  │  └─────┬──────┘
   └──────────┘ └─────────────────────────────┘        │ Watch fires
                                                       ▼
                                               ┌────────────┐
                                               │ Auto-stop  │
                                               │ runs:      │
                                               │ wake fresh │
                                               │ → Active   │
                                               └─────┬──────┘
                                                     │ patch spec.replicas
                                                     ▼
                                               ┌────────────┐
                                               │ Blue-green │
                                               │ creating   │
                                               └────────────┘

Why an annotation

Level-driven: The annotation timestamp is part of the resource state. Any reconcile (Firebolt Operator restart, periodic poll, watch event) reads the same value and converges identically.
Fire-and-forget: a wake-up does not require a synchronous response. The gateway buffers the triggering query locally and retries against engine DNS once Envoy’s active health checks observe a ready upstream.
Coalescing: 1000 simultaneous queries for the same stopped engine produce a handful of identical patches via K8s optimistic concurrency, not a thundering-herd RPC.
Firebolt Operator-down tolerant: K8s API still accepts the patch when the Firebolt Operator manager is restarting. The next reconciler instance picks it up.

Wake annotation TTL

DefaultAutoStopWakeTTL (5 minutes) bounds how long an unrefreshed annotation continues to trigger scale-up. Long enough to cover engine cold-start (image pull, blue-green creating phase) and short enough that an abandoned wake does not pin an engine after the gateway has given up. The gateway is expected to keep stamping the annotation while it has buffered queries waiting. The annotation is honored only when spec.autoStop.enabled=true. Without an auto-stop policy the Firebolt Operator has no activeReplicas to scale to, and respecting a wake from a non-policy actor would silently override the user’s spec.replicas==0 intent.

Gateway RBAC

Each FireboltInstance now provisions per-gateway RBAC alongside the gateway Deployment:

Resource	Name	Purpose
`ServiceAccount`	`<instance>-gateway`	Identity attached to gateway pods.
`Role`	`<instance>-gateway-wake`	Grants `get`, `list`, `patch` on `fireboltengines` in the same namespace.
`RoleBinding`	`<instance>-gateway-wake`	Binds the SA to the Role.

RBAC cannot restrict patch to a specific subresource or field, so the gateway holds patch on the whole CR. The wake protocol constrains the gateway to a strategic-merge patch that only touches metadata.annotations[firebolt.io/wake-requested]. Misuse beyond that is reviewed via Kubernetes audit logs, not enforced by RBAC.

Note: The gateway-side implementation (intercept routing for stopped engines, buffer the request, patch the annotation, retry against engine DNS) is not yet wired in. It would extend the Envoy Lua filter rendered by the Firebolt Operator in internal/controller/instance_gateway.go. This document defines the contract and the Firebolt Operator-side enforcement (RBAC, fresh-annotation handling). The Envoy-side hook is tracked as a follow-up.

Overview

Performance and Observability

Security

Self-Managed

Managed service

Guides

SQL reference

Release notes

API reference

Legal

Auto-stop

Decision precedence

Level-driven encoding

Configuration

Status fields

Known limitations

Gateway wake-up protocol

Why an annotation

Wake annotation TTL

Gateway RBAC

​Auto-stop

​Decision precedence

​Level-driven encoding

​Configuration

​Status fields

​Known limitations

​Gateway wake-up protocol

​Why an annotation

​Wake annotation TTL

​Gateway RBAC

Auto-stop

Decision precedence

Level-driven encoding

Configuration

Status fields

Known limitations

Gateway wake-up protocol

Why an annotation

Wake annotation TTL

Gateway RBAC