> ## Documentation Index
> Fetch the complete documentation index at: https://docs.firebolt.io/llms.txt
> Use this file to discover all available pages before exploring further.

# Auto-stop and wake-up

> Operator-driven FireboltEngine auto-stop and the gateway wake-up annotation protocol.

## Auto-stop

Auto-stop is opt-in via `spec.autoStop.enabled=true`. When enabled, the Firebolt Operator owns `spec.replicas` and drives it between two fixed levels. It scales the engine up to `activeReplicas` when a UTC `schedule` window is open or a wake-up is requested, and scales it down to `idleReplicas` (default 0, which fully stops the engine) after the engine sits idle for `idleTimeout`. This is an activity-gated on/off toggle, not proportional autoscaling. The engine runs at either `idleReplicas` or `activeReplicas`, never a value in between, and query volume never changes the count. Observed query activity only keeps the engine warm by resetting the idle clock. It never adds replicas.

Auto-stop reuses the **same Prometheus signal** the drain check consumes: `firebolt_running_queries + firebolt_suspended_queries`, summed across all running pods of the active generation. Sharing the signal keeps "the engine is busy" in exactly one place. A pod that the drain check would refuse to evict is the same pod auto-stop counts as activity.

### Decision precedence

`computeAutoStopDecision` is a pure function over `(spec, status, observation, now)`. Precedence, top-down:

1. **Disabled**: If `spec.autoStop` is unset or `enabled=false`, auto-stop emits no decision and `spec.replicas` is fully user-owned.
2. **Wake requested**: `metadata.annotations["firebolt.io/wake-requested"]` carries an RFC 3339 timestamp younger than `DefaultAutoStopWakeTTL` (5 minutes). Replicas are scaled to `activeReplicas`. Reason `WakeRequested`. See [Gateway wake-up protocol](#gateway-wake-up-protocol).
3. **Schedule active**: `now` falls inside any window in `spec.autoStop.schedule`. Replicas are pinned at `activeReplicas`. Schedule wins over both idle and stopped paths so an "always-on during business hours" policy can wake a parked engine.
4. **Stopped**: `spec.replicas == 0`, no fresh wake annotation, no schedule window active. No-op.
5. **Scrape failed or activity observed**: Refresh `status.lastActivityTime`, do not scale. Scrape failures are grouped with activity intentionally because a broken probe must never look quiet enough to scale down.
6. **Quiet >= idleTimeout, replicas > idleReplicas**: Patch `spec.replicas = idleReplicas` and stamp `status.lastScaledAt`.
7. **First quiet observation**: `status.lastActivityTime` is anchored to `now` so a fresh engine gets one full `idleTimeout` of grace.

### Level-driven encoding

Scale events are encoded by patching `spec.replicas` via the standard `r.Update`. The `FireboltEngine` watch fires. The next reconcile takes the normal blue-green path through `creating`, `switching`, `draining`, and `cleaning`. Auto-stop runs only in **terminal phases** (`stable`/`stopped`) so it cannot fight a rollout in progress.

Because scale-down only fires when `firebolt_running_queries + firebolt_suspended_queries == 0` was just observed, the subsequent drain check on the old generation completes immediately. There is no wasted grace period. A separate "skip drain because auto-stop vouched for it" path is unnecessary.

### Configuration

| Field                          | Default    | Description                                                                                                  |
| ------------------------------ | ---------- | ------------------------------------------------------------------------------------------------------------ |
| `spec.autoStop.enabled`        | `false`    | Master toggle.                                                                                               |
| `spec.autoStop.activeReplicas` | (required) | Replica count when active.                                                                                   |
| `spec.autoStop.idleReplicas`   | `0`        | Floor. `0` fully stops the engine.                                                                           |
| `spec.autoStop.idleTimeout`    | `30m`      | Quiet window before scaling to `idleReplicas`.                                                               |
| `spec.autoStop.pollInterval`   | `1m`       | Scrape cadence.                                                                                              |
| `spec.autoStop.schedule[]`     | `[]`       | UTC `HH:MM`-`HH:MM` always-on windows. Optional `days` filter (`Mon`..`Sun`). End \< start crosses midnight. |

### Status fields

| Field                     | Meaning                                                                                                                             |
| ------------------------- | ----------------------------------------------------------------------------------------------------------------------------------- |
| `status.lastActivityTime` | Most recent observation that recorded activity (or, for a fresh engine, the first quiet observation). Drives the idle clock.        |
| `status.lastScaledAt`     | Timestamp of the most recent auto-stop-driven `spec.replicas` mutation. Distinguishes auto-stop scale events from user edits.       |
| `status.autoStopReason`   | Token: `Disabled` / `WakeRequested` / `ScheduleActive` / `Stopped` / `Initializing` / `ActivityObserved` / `ScrapeFailed` / `Idle`. |

### Known limitations

Auto-stop samples query activity once per `pollInterval` (default 1 minute), which leads to a few current edge cases:

* **Brief queries can be missed.** A query that runs for less than `pollInterval` may not be counted as activity, so it does not reset the idle timer. A steady stream of very short queries can fail to keep an engine awake.
* **An engine can stop earlier than expected.** An engine can scale down up to one `pollInterval` sooner than its configured `idleTimeout`.

## Gateway wake-up protocol

A FireboltEngine that auto-stop has scaled to zero replicas needs a way to come back to life when a query arrives. The Firebolt Operator and the Envoy-based gateway exchange a single, level-driven signal for this:

```text theme={"theme":{"light":"css-variables","dark":"css-variables"}}
                       patch annotation
                ┌─────────────────────────────┐
   ┌──────────┐ │   metadata.annotations      │  ┌────────────┐
   │ Gateway  ├─►   firebolt.io/wake-requested├─►│ Engine CR  │
   │ (Envoy)  │ │   = "<RFC 3339 timestamp>"  │  └─────┬──────┘
   └──────────┘ └─────────────────────────────┘        │ Watch fires
                                                       ▼
                                               ┌────────────┐
                                               │ Auto-stop  │
                                               │ runs:      │
                                               │ wake fresh │
                                               │ → Active   │
                                               └─────┬──────┘
                                                     │ patch spec.replicas
                                                     ▼
                                               ┌────────────┐
                                               │ Blue-green │
                                               │ creating   │
                                               └────────────┘
```

### Why an annotation

* **Level-driven**: The annotation timestamp is part of the resource state. Any reconcile (Firebolt Operator restart, periodic poll, watch event) reads the same value and converges identically.
* **Fire-and-forget**: a wake-up does not require a synchronous response. The gateway buffers the triggering query locally and retries against engine DNS once Envoy's active health checks observe a ready upstream.
* **Coalescing**: 1000 simultaneous queries for the same stopped engine produce a handful of identical patches via K8s optimistic concurrency, not a thundering-herd RPC.
* **Firebolt Operator-down tolerant**: K8s API still accepts the patch when the Firebolt Operator manager is restarting. The next reconciler instance picks it up.

### Wake annotation TTL

`DefaultAutoStopWakeTTL` (5 minutes) bounds how long an unrefreshed annotation continues to trigger scale-up. Long enough to cover engine cold-start (image pull, blue-green creating phase) and short enough that an abandoned wake does not pin an engine after the gateway has given up. The gateway is expected to keep stamping the annotation while it has buffered queries waiting.

The annotation is honored only when `spec.autoStop.enabled=true`. Without an auto-stop policy the Firebolt Operator has no `activeReplicas` to scale to, and respecting a wake from a non-policy actor would silently override the user's `spec.replicas==0` intent.

### Gateway RBAC

Each FireboltInstance now provisions per-gateway RBAC alongside the gateway Deployment:

| Resource         | Name                      | Purpose                                                                   |
| ---------------- | ------------------------- | ------------------------------------------------------------------------- |
| `ServiceAccount` | `<instance>-gateway`      | Identity attached to gateway pods.                                        |
| `Role`           | `<instance>-gateway-wake` | Grants `get`, `list`, `patch` on `fireboltengines` in the same namespace. |
| `RoleBinding`    | `<instance>-gateway-wake` | Binds the SA to the Role.                                                 |

RBAC cannot restrict patch to a specific subresource or field, so the gateway holds patch on the whole CR. The wake protocol constrains the gateway to a strategic-merge patch that only touches `metadata.annotations[firebolt.io/wake-requested]`. Misuse beyond that is reviewed via Kubernetes audit logs, not enforced by RBAC.

> **Note:** The gateway-side implementation (intercept routing for stopped engines, buffer the request, patch the annotation, retry against engine DNS) is not yet wired in. It would extend the Envoy Lua filter rendered by the Firebolt Operator in `internal/controller/instance_gateway.go`. This document defines the contract and the Firebolt Operator-side enforcement (RBAC, fresh-annotation handling). The Envoy-side hook is tracked as a follow-up.
