Engine reconciler architecture
The engine reconciler is split into three layers, with instance resolution as a hard prerequisite:Layer responsibilities
| Layer | File | I/O | Testability |
|---|---|---|---|
| Instance gate | engine_controller.go | Yes (reads FireboltInstance) | Requires envtest |
| Read | engine_state.go | Yes (K8s API reads) | Requires envtest |
| Compute | engine_reconcile.go | None | Pure unit tests |
| Write | engine_apply.go | Yes (K8s API writes) | Requires envtest |
instance.multi_engine.metadata_endpoint and instance.id: stable, stopped, and creating. stopped is included because if a ConfigMap is missing at zero replicas, the reconciler re-materializes it in place using live instance info. This is the same recovery path as stable. Phases that operate on existing resources (switching, draining, cleaning) skip the gate and proceed normally, ensuring that a transient instance issue does not stall an in-flight rollout. When the gate blocks, it sets the InstanceReady=False condition on the engine status and requeues. The condition update is part of the single updateStatus call. There is no separate status write for conditions.
The compute layer is the core of the Firebolt Operator. It is a pure function with no side effects, making it easy to test exhaustively without a running cluster.
State machine
The engine lifecycle is a six-phase state machine stored in.status.phase. Two of the six (stable and stopped) are terminal. The others are transition phases. The terminal phase is chosen by spec.replicas: non-zero resolves to stable, zero resolves to stopped. Every transition phase funnels through a single terminalPhase(spec) helper so the distinction is made in exactly one place.
computeStable code path. From the state machineβs perspective stopped is just stable with spec.replicas == 0 and a different surfaced name. The Ready condition distinguishes them: stable with ready pods is Ready=True, Reason=EngineReady. stopped is always Ready=False, Reason=Stopped. See Top-level Ready condition.
Phase descriptions
| Phase | What happens | Next phase |
|---|---|---|
| stable | Terminal phase when spec.replicas > 0. All resources match spec. No work to do. Requeues after 30s for drift detection. On spec change, writes only the status intent (Phase=creating, bumped currentGeneration) and requeues. No resources are created in this pass. | creating (on spec change) |
| stopped | Terminal phase when spec.replicas == 0. Structurally identical to stable: the active generation still exists as a zero-replica StatefulSet + headless Service + ConfigMap, but it is surfaced as a distinct phase. Spec-change detection and missing-resource re-materialization work identically to stable. | creating (on spec change) |
| creating | New-generation StatefulSet, headless Service, and ConfigMap are ensured. Waits for all pods to become ready. A zero-replica StatefulSet is trivially βreadyβ (0/0), so scale-to-zero transitions through this phase without blocking. If the spec changes while creating, the in-progress generation is abandoned (its resources are deleted), currentGeneration is bumped, and a fresh generation is created on the next reconcile. This avoids patching a live STS whose pods have already read a stale config. | switching (all pods ready) |
| switching | Updates the cluster Service selector to point to the new generation. | draining (if old generation exists), stable (initial deploy, replicas > 0), or stopped (initial deploy, replicas == 0) |
| draining | Waits for old-generation pods to finish serving queries. Skipped entirely when drainCheckEnabled: false or rollout: recreate. | cleaning (drain complete) |
| cleaning | Deletes old-generation StatefulSet, headless Service, and ConfigMap. Clears drainingGeneration. | stable (replicas > 0) or stopped (replicas == 0) |
Key invariant
A spec change duringdraining or cleaning does not create a new generation. The current transition must complete before a new one begins. This prevents unbounded resource accumulation.
Top-level Ready condition
setReadyCondition derives status.conditions[type=Ready] from the post-reconcile phase and pod state. Its precedence is:
InstanceNotReady: The referencedFireboltInstanceis not healthy. Wins over everything else because nothing downstream works without it.Stopped:Phase == stopped.Ready=False, Reason=Stopped, Message="Engine is stopped (spec.replicas is 0)". Explicitly distinguished fromRollingso GitOps tooling can tell an intentionally parked engine apart from one mid-transition.Rolling: Phase is any non-terminal phase (creating/switching/draining/cleaning).Ready=False, Reason=Rolling.PodsNotReady: Phase isstablebut the active-generation pods have not all reported Ready yet.Ready=False, Reason=PodsNotReady.EngineReady: Default.Ready=True. The engine is serving traffic on its active generation.
Stopped is the only Ready=False reason that is not a transient rollout or instance-dependency failure. GitOps tools that key off Ready=True should treat a stopped engine as deliberately not-converged-to-serving rather than retrying it indefinitely.
StatefulSet event propagation
A FireboltEngine can get stuck increating or in stable with PodsNotReady when its generation StatefulSet exists but the StatefulSet controller cannot create the desired pods. Common causes include a missing ServiceAccount, exceeded ResourceQuota, PodSecurity or admission rejection, RBAC denial, unbindable PVC, and similar issues. The Firebolt Operator owns the StatefulSet, but Kubernetes records the actionable error as a Warning event, typically FailedCreate, on the StatefulSet object. Without this propagation, you would have to run kubectl describe sts <name> to triage.
To surface this on the FireboltEngine itself, after computing the Ready condition the reconciler queries the apiserver for Warning events on the current-generation StatefulSet whenever:
CurrentSTS != nil: There is an STS to look up events for.CurrentPodTotal < spec.replicas: Pods are missing rather than just unready.Ready.Reason β {Rolling, PodsNotReady}: The existing reason is a generic βstuckβ reason that we are allowed to refine.InstanceNotReady,DrainCheckFailing,Stopped, andEngineReadyare higher-precedence diagnostics or healthy states and are not overridden.
Reason, such as FailedCreate, and a message of the form StatefulSet <name>: <event message> (x<count>). The lookup uses the Clientset, not the controller-runtime cache, with field selector involvedObject.uid=<UID>,type=Warning. Events are high-volume cluster-wide, and a watch would inflate the controllerβs cache for a signal we consult only on already-stuck engines. Fetch failures are logged and swallowed. The diagnostic is best-effort and must never poison the main reconcile path. Once pods come up the trigger gate stops firing and the next reconcile restores EngineReady.
Generation model
Each spec change (while instable or stopped) increments status.currentGeneration. Resources for each generation are named with a -g<N> suffix:
stsMatchesSpec is the central drift detector. It compares the live StatefulSet against the resolved engine spec field-by-field. Any mismatch returns false and the reconciler bumps currentGeneration. Two annotations on the StatefulSet act as content hashes for inputs that donβt have a clean direct comparison:
| Annotation | Source | What a change means |
|---|---|---|
firebolt.io/custom-engine-config-hash | spec.customEngineConfig after the protected-paths strip | The engine ConfigMap content changed. Roll a new generation. |
firebolt.io/engine-class-hash | Resolved FireboltEngineClass.spec.template (or absent when spec.engineClassRef is nil) | Either the referenced class was edited in place, the engine flipped to a different class, or engineClassRef was cleared. Any of those rolls a new generation. |
Error handling
The Firebolt Operator follows strict error propagation rules to ensure failures are always visible. No swallowed errors. Every error from an I/O operation is either:- Returned to the caller (causing a retry via requeue), or
- Logged and aggregated when multiple independent cleanup operations must all be attempted (e.g.
reconcileDelete).
| Category | Policy |
|---|---|
| Status update failures | Always propagated. A failed status write returns an error so the next reconcile retries with fresh state. |
| Resource list/delete during cleanup | Errors are logged, collected, and aggregated. The finalizer is only removed when all cleanup operations succeed. This prevents premature garbage collection when the API server is unhealthy. |
| Pod readiness and drain checks | Errors from checkPodsReady are propagated rather than defaulting to βnot readyβ. Errors from checkDrainComplete, such as a transient metrics-scrape failure, are logged and treated as βnot drained yetβ. Drain is already a bounded-retry loop at the caller, so re-polling is cheaper and less noisy than blowing up the whole reconcile on a flaky scrape. |
| JSON marshalling | Config values passed to json.MarshalIndent are always well-typed maps. The error path is unreachable and guarded with a panic to catch programming bugs immediately. |
| Terminal errors | Unrecoverable conditions set the instance phase to Failed and surface the error, rather than entering an infinite retry loop. |
Status update strategy
Status updates user.Status().Update() with a single retry on conflict. If a resource version conflict occurs because a concurrent spec update changed the object, the Firebolt Operator re-fetches the latest object, applies the new status, and retries once. This avoids unnecessary reconcile-loop failures from optimistic concurrency.
Crash recovery
The Firebolt Operator is crash-safe at every phase boundary. If the process terminates:- During stable β creating transition: The
stablephase writes only the status intent (Phase=creating, bumpedcurrentGeneration) in one pass, then requeues. Resources are not created until the status update is persisted. If the Firebolt Operator crashes before the status write, no resources were created and the next reconcile retries fromstable. If it crashes after, the next reconcile enterscreatingand creates the resources normally. - During creating: The next reconcile sees an existing StatefulSet with not-ready pods and waits. All
ensurecalls are idempotent, so partial resource creation is safe. If the spec changed and the Firebolt Operator crashed after deleting the old generationβs resources but before bumpingcurrentGeneration, the next reconcile finds no resources for the current generation and recreates them fresh, converging to the correct state. - During switching: the next reconcile checks the service selector and either updates it or proceeds.
- During draining: the next reconcile re-runs the drain check.
- During cleaning: the next reconcile re-deletes any remaining old resources (delete is idempotent).
- During stable: no work needed.
Pod template merge
Per-engine pod-template overrides live underspec.template. When
spec.engineClassRef is set, the Firebolt Operator composes the rendered
StatefulSet pod template from three layers (top wins on conflict for scalar
and struct fields):
- Firebolt Operator defaults (
terminationGracePeriodSeconds=60, hardenedrunAsUser/fsGroup,firebolt.io/engineandfirebolt.io/generationlabels, operator-owned volumes). FireboltEngineClass.spec.template(shared by every engine that references the class).FireboltEngine.spec.templateon this engine.
tolerations, initContainers, sidecars, env, envFrom,
volumeMounts, imagePullSecrets, volumes) concatenate class-then-engine.
The validating webhook applies the same allowlist to class and engine templates.
Rejected paths are enumerated in the
FireboltEngine CRD reference
and FireboltEngineClass CRD reference.
For merge rules the reconciler uses, see
FireboltEngineClass design.
Changes to spec.template or the resolved class content trigger a new blue-green generation.
Firebolt Operator-managed resources
Do not modify these resources manually. For an engine namedmy-engine:
| Resource | Name pattern | Purpose |
|---|---|---|
| Engine Service | my-engine-service | Headless Service exposing the current generationβs pod IPs. |
| StatefulSet | my-engine-g{N} | Pods for generation N |
| Headless Service | my-engine-g{N}-hl | Pod DNS for generation N |
| Config ConfigMap | my-engine-g{N}-config | Engine config for generation N |
Admission resource bounds
The validating webhook can be configured with per-dimension maxima for the engine containerβsresources.requests and resources.limits. When
configured, any FireboltEngine create or update whose engine container
cpu, memory, or ephemeral-storage value exceeds the matching maximum is
rejected at admission with a
spec.template.spec.containers[engine].resources.{requests,limits}.{name}
field error.
Bounds are opt-in (defaults are empty). Configure via Helm when the webhook is enabled:
Resource ownership
All per-engine resources have:- An
ownerReferencepointing to theFireboltEngineCR (for garbage collection on CR deletion). - A
firebolt.io/enginelabel (for listing/filtering). - A
firebolt.io/generationlabel (for generation-based selection). - A finalizer on the CR itself to ensure cleanup runs before the CR is removed.