Prerequisites
- Kubernetes 1.28 or later. The CRDs use CEL transition rules for field immutability.
- Helm installed on the machine or automation that deploys the Firebolt Operator.
- Access to the OCI Helm chart registry that hosts the Firebolt Operator charts.
- Engine nodes that provide a locked-memory (
memlock) limit of at least 8 GiB. See Engine node requirements.
Engine node requirements
When an engine starts, it locks memory forio_uring, so it needs a memlock limit of at least 8 GiB. Each engine pod takes that limit from containerd, the container runtime on its node. If the limit is too low, the engine crashes at startup with this error:
infinity or a value in bytes. Anything that is infinity or at least 8 GiB (8589934592 bytes) is enough, and that node needs no further change. Some node images already provide a high enough limit, while others default to around 8 MiB, which is far too low.
If the limit is too low, raise memlock on the container runtime (containerd) with a systemd drop-in:
infinity is simplest. A bounded value such as LimitMEMLOCK=8G also works, as long as it is at least 8 GiB. Reload systemd, restart containerd, and confirm the limit:
Amazon EKS
If you are using the Amazon EKS AMI, thememlock limit is already infinity, so you don’t need to change anything on your engine nodes.
If you are using a different OS, you need to verify the setting. One way to configure it on AWS is via an EC2 user-data script.
Google GKE
GKE node system configuration does not exposememlock, so apply the drop-in with a privileged DaemonSet that writes it to each node and restarts containerd. The DaemonSet reapplies it to new nodes as they join, which covers upgrades and autoscaling:
containerd also restarts this pod, and the check lets the replacement pod skip a second restart once the limit is already in place.
GKE Autopilot blocks privileged DaemonSets, so you cannot use this approach there directly. Either request a privileged-workload allowlist (--autopilot-privileged-admission) or run your engine node pool on GKE Standard.
Microsoft Azure (AKS)
AKS does not exposeLimitMEMLOCK through its supported node configuration, so apply the same privileged DaemonSet shown for Google GKE, or bake the drop-in into a custom node image. Either way, it reapplies automatically to nodes added by node image upgrades and cluster autoscaling, so you do not have to touch nodes by hand.
Install the CRDs
The Firebolt Operator comes with three CustomResourceDefinitions:FireboltInstanceFireboltEngineFireboltEngineClass
crds/ directory. If you would like to learn more about the
pros and cons of each option, please refer to the Helm chart best practices for CRDs.
Option 1 (recommended): Separate CRD chart
Install the CRD chart before installing the Firebolt Operator chart. This is the recommended option as it allows you full control over the CRDs lifecycle.Option 2: Firebolt Operator chart crds/ directory
The Firebolt Operator chart bundles CRDs in its crds/ directory. Helm gives this
directory special handling: on helm install, CRDs in crds/ are installed
before the chart templates are rendered. If a CRD already exists, Helm skips it
with a warning. You can opt out of this bundled CRD installation with
--skip-crds.
Helm does not upgrade or delete CRDs from a chart’s crds/ directory (this is
the trade-off the Helm CRD best practices caution against for ongoing use).
Prefer Option 1 for any deployment that will need to upgrade the operator over time.
Install the Firebolt Operator
Install the Firebolt Operator controller:--skip-crds if you chose the bundled crds/ directory option above.
Multi-tenant install: scope the Firebolt Operator to specific namespaces
By default the Firebolt Operator watches every namespace and the chart renders aClusterRole plus ClusterRoleBinding so it can read and
write the resources it manages anywhere in the cluster. If you only
want the Firebolt Operator to act in a fixed set of namespaces, list
them under watchNamespaces:
Role plus RoleBinding in each listed
namespace (carrying the same rule set the ClusterRole would have)
and starts the manager with --namespaces=tenant-a,tenant-b so its
cache only spans those namespaces. The Firebolt Operator’s blast
radius is bounded to that list; CRs created in other namespaces are
silently ignored.
To onboard a new tenant namespace, add it to watchNamespaces and
re-run helm upgrade. The new Role and RoleBinding land in the
new namespace and the manager restart picks up the extended flag.
If you also need the apiserver pod-proxy permission (because you set
FireboltInstance.spec.metricScrapeMode=ApiserverProxy on any
instance), enable the matching opt-in:
ClusterRole (or per-namespace Role when
watchNamespaces is set) that grants only pods/proxy: get. The
default metricScrapeMode=PodIP does not need this permission.
Firebolt Operator flags
The Firebolt Operator supports these runtime flags. The binary default is what the manager uses when you run it directly. The Helm chart default is what thefirebolt-operator chart passes with its default values.yaml.
| Flag | Binary default | Helm chart default | Description |
|---|---|---|---|
--version | false | Not set | Print the version and exit. |
--namespaces | "" | Derived from watchNamespaces (omitted when empty) | Comma-separated list of namespaces to watch. Empty watches every namespace (cluster-wide install, requires the chart’s ClusterRole). A non-empty list confines the manager cache to those namespaces and pairs with per-namespace Role and RoleBinding from the chart. |
--metrics-bind-address | 0 | :8443 | Address for the metrics endpoint. Use 0 to disable metrics. |
--metrics-secure | true | true | Serve metrics over HTTPS with Kubernetes authentication and authorization. |
--metrics-cert-path | "" | Not set | Directory that contains the metrics server certificate. |
--metrics-cert-name | tls.crt | Not set | Metrics server certificate file name. |
--metrics-cert-key | tls.key | Not set | Metrics server key file name. |
--health-probe-bind-address | :8081 | :8081 | Address for health probes. |
--leader-elect | false | true | Enable leader election for HA deployments. |
--enable-webhooks | true | false | Enable the admission webhook server. |
--webhook-cert-path | "" | Not set | Directory that contains the webhook certificate. The chart sets /tmp/k8s-webhook-server/serving-certs when webhook.enabled=true. |
--webhook-cert-name | tls.crt | Not set | Webhook certificate file name. |
--webhook-cert-key | tls.key | Not set | Webhook key file name. |
--enable-http2 | false | Not set | Enable HTTP/2 for the metrics and webhook servers. |
--engine-max-cpu | "" | Not set | Maximum allowed engine-container CPU request and limit (resolved from FireboltEngine.spec.template.spec.containers[engine].resources or the referenced FireboltEngineClass’s container resources). Empty disables the bound. |
--engine-max-memory | "" | Not set | Maximum allowed engine-container memory request and limit (same resolution as --engine-max-cpu). Empty disables the bound. |
--engine-max-ephemeral-storage | "" | Not set | Maximum allowed engine-container ephemeral-storage request and limit (same resolution as --engine-max-cpu). Empty disables the bound. |
--zap-devel | false | Not set | Enable controller-runtime development logging defaults. |
--zap-encoder | json | json | Log encoding. Valid values are json and console. |
--zap-log-level | info | info | Minimum log level. Valid values include debug, info, error, and panic. |
--zap-stacktrace-level | error | error | Level at and above which stack traces are captured. |
--zap-time-encoding | rfc3339 | Not set | Timestamp encoding for zap logs. |
Next step
After installation, follow the quickstart to create your firstFireboltEngine.