X-Firebolt-Drained shutdown path.
The Firebolt Operator pins Envoy’s per_connection_buffer_limit_bytes to 2 MiB on both the listener and the dynamic-forward-proxy cluster. The value is intentionally not exposed on the CR. See the comment on gatewayPerConnectionBufferLimitBytes in internal/controller/instance_gateway.go. This value is part of the Firebolt Operator’s zero-downtime and memory-budget contract. A per-instance override could silently break retry coverage or the gateway memory limit.
Two consequences this fixed value imposes on operations:
- Memory budget. Peak buffering per gateway pod is roughly
expected_concurrent_requests x (1 + retry_factor) x 2 MiB, whereretry_factoris the fraction of in-flight requests you expect to be retried during a cutover. This is typically small and bounded by the active health-check interval and the size of the engine fleet behind a single authority. When expected concurrency grows, raisespec.gateway.replicasandspec.gateway.template.spec.containers[name=="envoy"].resources.limits.memorytogether. OOMKills here translate directly into client-visible failures because the gateway is the only zero-downtime entry point. - Requests larger than 2 MiB are not retried. Envoy can only replay a request whose body fits in the per-connection buffer. Anything bigger is dispatched without buffering. Any 503 it gets, including a retry-safe
X-Firebolt-Drained503 from the engine’s pre-work shutdown fence, propagates to the client unretried. If your workloads send single requests above this threshold, such as multi-MiBCOPYingest or large multi-statement batches, split them client-side or accept that those requests can fail during a cutover.
Per-engine circuit breakers
The Firebolt Operator also stamps Envoy circuit-breaker thresholds on the dynamic-forward-proxy cluster. Because each unique engine authority materializes its ownSTRICT_DNS sub-cluster, these thresholds apply per engine, per gateway pod. A runaway engine cannot consume more than its share of connection-pool slots, pending-request queue, in-flight stream budget, or retries. This prevents it from starving sibling engines on the same gateway pod.
Defaults (matched in internal/controller/instance_gateway.go against constants of the same names, and asserted by TestBuildEnvoyConfigYAMLCircuitBreakers):
| Field | Value | What it caps |
|---|---|---|
max_connections | 1024 | Concurrent upstream TCP connections to one engine through one gateway pod. With max_requests_per_connection: 1 this is also the concurrent in-flight query cap per engine. |
max_pending_requests | 1024 | Queue depth before Envoy returns a synthetic 503 with response flag UO (upstream overflow). |
max_requests | 1024 | HTTP/2 active streams per engine sub-cluster. Held in lockstep with max_connections because max_requests_per_connection: 1 collapses the two dimensions. |
max_retries | 256 | Cluster-wide simultaneous retry budget. This is higher than Envoy’s default of 3 because the route’s num_retries is 50 and a cutover can keep many requests retrying at once. |
per_connection_buffer_limit_bytes. A per-instance override would either be a no-op when limits are set high or break the per-engine isolation contract when limits are set low enough to throttle steady-state traffic on one engine while leaving the gateway’s global memory budget unchanged. If you expect per-engine concurrency above max_connections, raise spec.gateway.replicas so the total gateway capacity grows in proportion. Do not change these per-pod caps.
Operationally, a request rejected by a tripped circuit breaker shows up as a synthetic Envoy 503 with response flag UO, UAEX, or URX in the access log. Sustained 503s with these flags against one engine indicate that the engine is saturating the per-pod cap. Investigate the engine before raising the gateway’s replica count.