Skip to main content
This page configures Amazon S3 or an S3-compatible endpoint as engine object storage. Every engine needs object storage for managed table data. The chart does not support local-filesystem storage for engines, so an engine pod never becomes Ready until customEngineConfig.storage points at object storage. With object storage as the backing store, durability does not depend on the per-pod data volumes mounted to each engine. Even a complete loss of those volumes does not cause data loss, because the authoritative copy of managed table data lives in the object store. You configure object storage on the engine through customEngineConfig.storage, which the chart passes through unchanged into the engine’s config.yaml. The type, api_scheme, and bucket_name keys match the Firebolt Core configuration schema, and the chart does not validate them. The engine reads AWS credentials from the pod’s workload identity, which you configure through AWS IRSA or AWS Pod Identity.

Prerequisites

Before you begin, ensure that you have the following installed and configured:
  • A Kubernetes cluster running on Amazon EKS.
  • kubectl configured to access your cluster.
  • helm v3 installed on your local machine.
  • An AWS account with permissions to create S3 buckets, IAM roles, and IAM policies.
  • An engine image that supports the s3 storage backend.

Use Amazon S3

The following examples use an S3 bucket named firebolt-managed, but you can choose any name you like.

Create the bucket

Create an S3 bucket, block all public access, and turn on default server-side encryption:
# Region and bucket name used by the AWS CLI calls below.
export AWS_DEFAULT_REGION=us-east-1
export BUCKET_NAME=firebolt-managed

# Create the bucket in the configured region.
aws s3api create-bucket --bucket "${BUCKET_NAME}"

# Block all forms of public access at the bucket level.
aws s3api put-public-access-block \
  --bucket "${BUCKET_NAME}" \
  --public-access-block-configuration \
  BlockPublicAcls=true,IgnorePublicAcls=true,BlockPublicPolicy=true,RestrictPublicBuckets=true

# Enable default SSE-S3 encryption on every object written.
aws s3api put-bucket-encryption \
  --bucket "${BUCKET_NAME}" \
  --server-side-encryption-configuration \
  '{"Rules":[{"ApplyServerSideEncryptionByDefault":{"SSEAlgorithm":"AES256"}}]}'

Create an IAM role

The role needs ListBucket on the bucket and GetObject* and PutObject* on its contents:
{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Sid": "StorageBuckets",
      "Effect": "Allow",
      "Action": ["s3:ListBucket"],
      "Resource": ["arn:aws:s3:::firebolt-managed"]
    },
    {
      "Sid": "ObjectAccess",
      "Effect": "Allow",
      "Action": ["s3:GetObject*", "s3:PutObject*"],
      "Resource": ["arn:aws:s3:::firebolt-managed/*"]
    }
  ]
}
Bind the role to a ServiceAccount in the release namespace through the IRSA annotation:
apiVersion: v1
kind: ServiceAccount
metadata:
  name: firebolt-engine
  namespace: firebolt
  annotations:
    eks.amazonaws.com/role-arn: arn:aws:iam::<account-id>:role/<engine-s3-role>

Point the chart at both

Run the engine pods under the annotated ServiceAccount and set the storage block to the S3 bucket:
# my-values.yaml
engineSpec:
  serviceAccount: firebolt-engine

customEngineConfig:
  storage:
    type: s3
    api_scheme: "s3://"
    bucket_name: firebolt-managed
Create the ServiceAccount, then install the chart with the matching values:
# Create the IRSA-annotated ServiceAccount in the release namespace.
kubectl apply -f engine-serviceaccount.yaml

# Install the chart against the bucket and the ServiceAccount.
helm install firebolt ./helm \
  --namespace firebolt --create-namespace \
  -f my-values.yaml
To set the AWS region explicitly, add region to the storage block. The EKS identity webhook usually injects the region automatically.
customEngineConfig:
  storage:
    type: s3
    api_scheme: "s3://"
    bucket_name: firebolt-managed
    region: us-east-1

Confirm that object storage works

Create a table, insert a row, and list the bucket to confirm the engine wrote data through to S3:
# Forward the gateway Service to localhost:8080 in the background.
kubectl -n firebolt port-forward svc/firebolt-gateway 8080:80 &

# Create a table on the engine.
curl -s http://localhost:8080/ -H "X-Firebolt-Engine: default" \
  -H "Content-Type: text/plain" --data "create table t (val int)"

# Insert one row, which forces the engine to write a tablet.
curl -s http://localhost:8080/ -H "X-Firebolt-Engine: default" \
  -H "Content-Type: text/plain" --data "insert into t values (1)"

# List the bucket. New object-storage prefixes appear as the engine writes data.
aws s3 ls s3://firebolt-managed/
New prefixes appear under the bucket as the engine writes data.

Use an S3-compatible endpoint

For any S3-compatible endpoint reachable from the engine pods, such as self-hosted MinIO, Ceph RGW, or an in-cluster S3 emulator, use type: minio. The engine signs requests with the access key and secret key firebolt in this mode, so the endpoint must accept those credentials.
customEngineConfig:
  storage:
    type: minio
    api_scheme: "s3://"
    bucket_name: firebolt-managed
    minio:
      endpoint: http://minio.minio.svc.cluster.local:9000
endpoint must be a URL the engine pod can resolve and reach. Create the bucket out of band before the engine starts.

Restrict external access with an intermediary role

The bucket you set under customEngineConfig.storage holds the engine’s managed tablet data, and the engine reaches it with the engine pod’s own AWS identity. Queries that read from or write to external locations, such as external tables or COPY statements that point at a different bucket, follow a separate credential path. By default, external access also uses the engine pod’s own AWS identity. That identity belongs to this chart release, so it is not a convenient identity for the owner of an external bucket to reference when they grant access. An intermediary role gives external access a stable identity instead. When you set one, the engine assumes the intermediary role first, and then assumes the external role from there, rather than using its own pod identity. Because the intermediary role ARN is stable and known ahead of time, you can share it with third parties and reference it in S3 bucket policies, IAM role trust policies, and AWS accounts outside your own organization. Access to the object storage bucket always uses the engine pod’s own identity, so the intermediary role applies only to external locations. Create the intermediary IAM role and grant the engine’s identity permission to assume it. The intermediary role’s trust policy must allow the engine ServiceAccount identity to assume it, and the role needs only sts:AssumeRole on the external roles it is allowed to reach. Set the intermediary role ARN under customEngineConfig.storage.aws.intermediary_access_role:
customEngineConfig:
  storage:
    type: s3
    api_scheme: "s3://"
    bucket_name: firebolt-managed
    aws:
      intermediary_access_role: arn:aws:iam::<account-id>:role/firebolt-intermediary
The chart passes the storage.aws block through unchanged. The block is valid only when type is s3.

Storage scope

customEngineConfig is global to the release. Multiple engines under the same engines: list share the same customEngineConfig.storage block, and therefore the same bucket. To run engines against different buckets, install the chart twice in separate releases, each with its own customEngineConfig.storage.