Skip to main content
The Envoy gateway is the only zero-downtime entry point. Size its replica count and memory limit for both steady-state traffic and the retry amplification introduced by the X-Firebolt-Drained shutdown path. The Firebolt Operator pins Envoy’s per_connection_buffer_limit_bytes to 2 MiB on both the listener and the dynamic-forward-proxy cluster. The value is intentionally not exposed on the CR. See the comment on gatewayPerConnectionBufferLimitBytes in internal/controller/instance_gateway.go. This value is part of the Firebolt Operator’s zero-downtime and memory-budget contract. A per-instance override could silently break retry coverage or the gateway memory limit. Two consequences this fixed value imposes on operations:
  • Memory budget. Peak buffering per gateway pod is roughly expected_concurrent_requests x (1 + retry_factor) x 2 MiB, where retry_factor is the fraction of in-flight requests you expect to be retried during a cutover. This is typically small and bounded by the active health-check interval and the size of the engine fleet behind a single authority. When expected concurrency grows, raise spec.gateway.replicas and spec.gateway.template.spec.containers[name=="envoy"].resources.limits.memory together. OOMKills here translate directly into client-visible failures because the gateway is the only zero-downtime entry point.
  • Requests larger than 2 MiB are not retried. Envoy can only replay a request whose body fits in the per-connection buffer. Anything bigger is dispatched without buffering. Any 503 it gets, including a retry-safe X-Firebolt-Drained 503 from the engine’s pre-work shutdown fence, propagates to the client unretried. If your workloads send single requests above this threshold, such as multi-MiB COPY ingest or large multi-statement batches, split them client-side or accept that those requests can fail during a cutover.

Per-engine circuit breakers

The Firebolt Operator also stamps Envoy circuit-breaker thresholds on the dynamic-forward-proxy cluster. Because each unique engine authority materializes its own STRICT_DNS sub-cluster, these thresholds apply per engine, per gateway pod. A runaway engine cannot consume more than its share of connection-pool slots, pending-request queue, in-flight stream budget, or retries. This prevents it from starving sibling engines on the same gateway pod. Defaults (matched in internal/controller/instance_gateway.go against constants of the same names, and asserted by TestBuildEnvoyConfigYAMLCircuitBreakers):
FieldValueWhat it caps
max_connections1024Concurrent upstream TCP connections to one engine through one gateway pod. With max_requests_per_connection: 1 this is also the concurrent in-flight query cap per engine.
max_pending_requests1024Queue depth before Envoy returns a synthetic 503 with response flag UO (upstream overflow).
max_requests1024HTTP/2 active streams per engine sub-cluster. Held in lockstep with max_connections because max_requests_per_connection: 1 collapses the two dimensions.
max_retries256Cluster-wide simultaneous retry budget. This is higher than Envoy’s default of 3 because the route’s num_retries is 50 and a cutover can keep many requests retrying at once.
These values are hard-coded for the same reason as per_connection_buffer_limit_bytes. A per-instance override would either be a no-op when limits are set high or break the per-engine isolation contract when limits are set low enough to throttle steady-state traffic on one engine while leaving the gateway’s global memory budget unchanged. If you expect per-engine concurrency above max_connections, raise spec.gateway.replicas so the total gateway capacity grows in proportion. Do not change these per-pod caps. Operationally, a request rejected by a tripped circuit breaker shows up as a synthetic Envoy 503 with response flag UO, UAEX, or URX in the access log. Sustained 503s with these flags against one engine indicate that the engine is saturating the per-pod cap. Investigate the engine before raising the gateway’s replica count.