customEngineConfig.storage points at object storage.
With Google Cloud Storage as the backing store, durability does not depend on the per-pod data volumes mounted to each engine. Even a complete loss of those volumes does not cause data loss, because the authoritative copy of managed table data lives in the bucket.
You configure object storage on the engine through customEngineConfig.storage, which the chart passes through unchanged into the engine’s config.yaml. The type, api_scheme, and bucket_name keys match the Firebolt Core configuration schema, and the chart does not validate them. The engine reads Google Cloud credentials from the pod’s Google identity, which you provide with Workload Identity Federation for GKE.
The chart passes
customEngineConfig.storage through unchanged and does not validate the type. The gcs backend requires an engine image that supports it. An unsupported type is written verbatim into the engine config.yaml, so the engine fails at startup rather than at install time.Prerequisites
Before you begin, ensure that you have the following installed and configured:- A Kubernetes cluster running on Google Kubernetes Engine with Workload Identity Federation enabled.
kubectlconfigured to access your cluster.helmv3 installed on your local machine.gcloudconfigured for your project.- A Google Cloud project with permissions to create buckets and IAM service accounts.
- An engine image that supports the
gcsstorage backend.
Use Google Cloud Storage
The following examples use a bucket namedfirebolt-managed in the project my-project, but you can choose any name you like.
Create a bucket
Create a Google Cloud Storage bucket with uniform bucket-level access and public access prevention:Grant the engine a Google identity
Create a Google service account, grant it object access on the bucket, and allow the engine’s Kubernetes ServiceAccount to impersonate it:Point the chart at the bucket
Run the engine pods under the annotated ServiceAccount and set the storage block to the Google Cloud Storage bucket. The default scheme forgcs is gs://.
Confirm that object storage works
Create a table, insert a row, and list the bucket to confirm the engine wrote data through to Google Cloud Storage:Restrict external access with an intermediary service account
The bucket you set undercustomEngineConfig.storage holds the engine’s managed tablet data, and the engine reaches it with the engine pod’s own Google identity. Queries that read from or write to external locations, such as external tables that point at a different bucket, follow a separate credential path.
By default, external access also uses the engine pod’s own Google identity. That identity belongs to this chart release, so it is not a convenient identity for the owner of an external bucket to reference when they grant access.
An intermediary service account gives external access a stable identity instead. When you set one, the engine impersonates the intermediary service account for external access rather than using its own pod identity. Because the service account is stable and known ahead of time, you can share it with third parties and reference it in bucket IAM policies, including on Google Cloud projects outside your own organization. Access to the object storage bucket always uses the engine pod’s own identity, so the intermediary service account applies only to external locations.
Create the intermediary Google service account, grant the engine’s identity roles/iam.serviceAccountTokenCreator on it, and grant the intermediary the permissions it needs to reach the external data.
Set its ID under customEngineConfig.storage.gcp.intermediary_service_account_id:
storage.gcp block through unchanged. The block is valid only when type is gcs.
Storage scope
customEngineConfig is global to the release. Multiple engines under the same engines: list share the same customEngineConfig.storage block, and therefore the same bucket. To run engines against different buckets, install the chart twice in separate releases, each with its own customEngineConfig.storage.