Skip to main content

Gateway query routing

The Envoy gateway proxy acts as the entry point for client queries and is the only entry point on which the Firebolt Operator promises zero downtime across engine lifecycle events. It uses a Lua filter to pick the target engine from the X-Firebolt-Engine request header and a dynamic forward proxy (DFP) to resolve the per-engine headless Service at request time.

Configuration

The gateway ConfigMap ({instance}-gateway-config) is a pure function of the FireboltInstance. It does not depend on the set of engines. The Firebolt Operator does not regenerate it on engine create/delete/scale/blue-green events, so those events never trigger a gateway rollout. The configuration contains:
  • A Lua HTTP filter that validates X-Firebolt-Engine as a single RFC 1123 DNS label (lowercase alphanumerics and hyphens, ≤63 chars, no leading or trailing hyphen, no dots) and rewrites :authority to {engine}-service.{namespace}.svc.cluster.local:3473.
  • A dynamic forward proxy in sub-cluster mode (sub_clusters_config, not dns_cache_config). Each authority synthesizes a full STRICT_DNS sub-cluster on first use, so all A-records of the headless engine Service become individual upstream hosts with normal load-balancing, outlier-detection and previous_hosts retry semantics. DNS-cache mode would have collapsed the headless Service back into a single sticky IP per cluster and made previous_hosts a no-op.
  • max_requests_per_connection: 1 on the DFP cluster: every query gets a fresh TCP connect and therefore a fresh DNS lookup. This collapses the stale-IP window after a selector flip to a single TCP handshake instead of the STRICT_DNS refresh interval.
  • Active HTTP health checks on /health/ready every 1s with healthy_threshold: 1 / unhealthy_threshold: 1. The engine flips this endpoint to 503 immediately on SIGTERM, so Envoy ejects a draining pod from the load-balanced set within one probe interval, independently of DNS.
  • A route-level retry policy that retries on transport-level failures (connect-failure, refused-stream, reset) and on responses carrying the X-Firebolt-Drained header (retriable-headers). The header is set only by the engine’s pre-work shutdown fence (see Graceful pod shutdown), so that one specific shape of 503 is provably side-effect free and safe to retry. Bare 5xx is not retried, because once an engine has executed a request it may have applied side effects and a retry could duplicate them. num_retries: 50 combined with the previous_hosts retry-host predicate means each successive attempt lands on a pod we have not tried yet, until either the sub-cluster’s host set is exhausted or the client deadline expires.
  • per_connection_buffer_limit_bytes hard-coded to 2 MiB on both the listener and the DFP cluster (kept in lockstep). This is the request-replay budget: a retry, including the X-Firebolt-Drained one, can only be issued when the full request body fits in this buffer. Requests larger than the limit are dispatched without buffering and any 503 they receive propagates to the client unretried. The value is intentionally not surfaced on the CR. See the rationale comment on gatewayPerConnectionBufferLimitBytes in instance_gateway.go. It sits at the center of two Firebolt Operator-owned invariants: retry coverage and gateway memory budget. A per-instance override would invite settings that silently break either. Memory budget per gateway pod scales as concurrent_requests * (1 + retry_factor) * 2 MiB. Size gateway.replicas and gateway.resources.limits.memory to that envelope. The standalone Helm chart (firebolt-instance-helm) renders the same 2 MiB value through its own gateway-configmap so the two deployment paths behave identically. See Gateway sizing for the operational guidance.
  • Per-engine circuit breakers on the DFP cluster (circuit_breakers.thresholds[priority=DEFAULT]): max_connections=1024, max_pending_requests=1024, max_requests=1024, max_retries=256. Because each authority materializes its own STRICT_DNS sub-cluster, these thresholds apply per engine, per gateway pod. A misbehaving engine cannot consume more than its share of connection-pool slots, pending-request queue, in-flight stream budget, or retries, and therefore cannot starve sibling engines sharing the same gateway. The constants live in instance_gateway.go next to gatewayPerConnectionBufferLimitBytes. See gateway sizing for the sizing rationale and the operational signal (synthetic 503 with response flag UO/UAEX/URX) that indicates a tripped breaker.
  • An admin listener on 127.0.0.1:9901 used by the gateway pod’s own preStop hook (POST /healthcheck/fail) to fail the gateway’s readiness before the kubelet sends SIGTERM, so service load-balancers stop sending it new requests before its filters tear down.
Because the per-engine routing Service is headless (ClusterIP: None), the :authority hostname resolves directly to the set of ready pod IPs. kube-proxy is not in the data path, so there is no terminating-endpoint race where a SYN would be DNAT’d to a pod whose listener has already closed.

Traffic path

Client (X-Firebolt-Engine: my-engine) → Gateway Service (:80) → Envoy (:8080) → headless {engine}-service (pod IPs) → Engine Pod
The gateway Deployment uses a rolling update strategy with maxSurge: 25% and maxUnavailable: 0, ensuring zero downtime during gateway upgrades.

Graceful pod shutdown

When a blue-green cutover or scale-down deletes the old-generation StatefulSet, the kubelet sends SIGTERM to its pods while client queries may still be in flight at the gateway. Zero-downtime is preserved end-to-end by a chain of independent mechanisms. No Firebolt Operator-side gate on EndpointSlice routability is required. In order of when they fire after SIGTERM:
  1. Engine /health/ready flips to 503. The kubelet readiness probe sees this on its next scrape and marks the pod NotReady. The K8s endpoint controller removes the pod from the cluster Service’s EndpointSlices. CoreDNS stops returning that pod IP for the headless Service.
  2. Envoy active health check ejects the host. Envoy probes /health/ready directly on each pod IP every 1s. With unhealthy_threshold: 1, one failed probe is enough to remove the host from the load-balanced set. This is independent of DNS, so it does not wait on EndpointSlice / CoreDNS propagation.
  3. max_requests_per_connection: 1 + per-request DNS. Any query that arrives at the gateway after the host is excluded from DNS opens a fresh TCP connection (no keep-alive reuse), does a fresh DNS lookup, and never sees the dying pod.
  4. Engine pre-work shutdown fence. A query that did slip onto an open connection or a stale-DNS host before either of the above hides it hits the engine’s HTTP handler, which fast-fails with 503 Service Unavailable, Connection: close, and the X-Firebolt-Drained header before any executor or Storage Manager work runs. The connection drops out of any pool, and the request is provably side-effect free.
  5. Gateway retries the 503 on a different host. The retriable-headers rule matches X-Firebolt-Drained: present. Combined with the previous_hosts retry-host predicate, the retry never picks the same draining pod. The client sees a single 200 response, provided the request body fits within per_connection_buffer_limit_bytes (hard-coded at 2 MiB). Bodies that exceed the buffer are dispatched without buffering and the 503 is propagated to the client unretried. Workloads that send single requests above 2 MiB are out of scope for the zero-downtime contract.
In-flight queries that the engine accepted before SIGTERM continue to run, bounded by shutdown_wait_unfinished (= terminationGracePeriodSeconds - 5s). The fence only fences new requests. It does not interrupt work already in progress.