> ## Documentation Index
> Fetch the complete documentation index at: https://docs.firebolt.io/llms.txt
> Use this file to discover all available pages before exploring further.

> Amazon S3 and S3-compatible object storage for engine managed table data, with AWS workload identity and intermediary roles for external access.

# Amazon S3

This page configures Amazon S3 or an S3-compatible endpoint as engine object storage.

Every engine needs object storage for managed table data. The chart does not support local-filesystem storage for engines, so an engine pod never becomes Ready until `customEngineConfig.storage` points at object storage.

With object storage as the backing store, durability does not depend on the per-pod data volumes mounted to each engine. Even a complete loss of those volumes does not cause data loss, because the authoritative copy of managed table data lives in the object store.

You configure object storage on the engine through `customEngineConfig.storage`, which the chart passes through unchanged into the engine's `config.yaml`. The `type`, `api_scheme`, and `bucket_name` keys match the Firebolt Core configuration schema, and the chart does not validate them. The engine reads AWS credentials from the pod's workload identity, which you configure through [AWS IRSA](https://docs.aws.amazon.com/eks/latest/userguide/iam-roles-for-service-accounts.html) or [AWS Pod Identity](https://docs.aws.amazon.com/eks/latest/userguide/pod-identities.html).

## Prerequisites

Before you begin, ensure that you have the following installed and configured:

* A Kubernetes cluster running on Amazon EKS.
* `kubectl` configured to access your cluster.
* `helm` v3 installed on your local machine.
* An AWS account with permissions to create S3 buckets, IAM roles, and IAM policies.
* An engine image that supports the `s3` storage backend.

## Use Amazon S3

The following examples use an S3 bucket named `firebolt-managed`, but you can choose any name you like.

### Create the bucket

Create an S3 bucket, block all public access, and turn on default server-side encryption:

```bash theme={"theme":{"light":"css-variables","dark":"css-variables"}}
# Region and bucket name used by the AWS CLI calls below.
export AWS_DEFAULT_REGION=us-east-1
export BUCKET_NAME=firebolt-managed

# Create the bucket in the configured region.
aws s3api create-bucket --bucket "${BUCKET_NAME}"

# Block all forms of public access at the bucket level.
aws s3api put-public-access-block \
  --bucket "${BUCKET_NAME}" \
  --public-access-block-configuration \
  BlockPublicAcls=true,IgnorePublicAcls=true,BlockPublicPolicy=true,RestrictPublicBuckets=true

# Enable default SSE-S3 encryption on every object written.
aws s3api put-bucket-encryption \
  --bucket "${BUCKET_NAME}" \
  --server-side-encryption-configuration \
  '{"Rules":[{"ApplyServerSideEncryptionByDefault":{"SSEAlgorithm":"AES256"}}]}'
```

### Create an IAM role

The role needs `ListBucket` on the bucket and `GetObject*` and `PutObject*` on its contents:

```json theme={"theme":{"light":"css-variables","dark":"css-variables"}}
{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Sid": "StorageBuckets",
      "Effect": "Allow",
      "Action": ["s3:ListBucket"],
      "Resource": ["arn:aws:s3:::firebolt-managed"]
    },
    {
      "Sid": "ObjectAccess",
      "Effect": "Allow",
      "Action": ["s3:GetObject*", "s3:PutObject*"],
      "Resource": ["arn:aws:s3:::firebolt-managed/*"]
    }
  ]
}
```

Bind the role to a ServiceAccount in the release namespace through the IRSA annotation:

```yaml theme={"theme":{"light":"css-variables","dark":"css-variables"}}
apiVersion: v1
kind: ServiceAccount
metadata:
  name: firebolt-engine
  namespace: firebolt
  annotations:
    eks.amazonaws.com/role-arn: arn:aws:iam::<account-id>:role/<engine-s3-role>
```

### Point the chart at both

Run the engine pods under the annotated ServiceAccount and set the storage block to the S3 bucket:

```yaml theme={"theme":{"light":"css-variables","dark":"css-variables"}}
# my-values.yaml
engineSpec:
  serviceAccount: firebolt-engine

customEngineConfig:
  storage:
    type: s3
    api_scheme: "s3://"
    bucket_name: firebolt-managed
```

Create the ServiceAccount, then install the chart with the matching values:

```bash theme={"theme":{"light":"css-variables","dark":"css-variables"}}
# Create the IRSA-annotated ServiceAccount in the release namespace.
kubectl apply -f engine-serviceaccount.yaml

# Install the chart against the bucket and the ServiceAccount.
helm install firebolt ./helm \
  --namespace firebolt --create-namespace \
  -f my-values.yaml
```

To set the AWS region explicitly, add `region` to the storage block. The EKS identity webhook usually injects the region automatically.

```yaml theme={"theme":{"light":"css-variables","dark":"css-variables"}}
customEngineConfig:
  storage:
    type: s3
    api_scheme: "s3://"
    bucket_name: firebolt-managed
    region: us-east-1
```

### Confirm that object storage works

Create a table, insert a row, and list the bucket to confirm the engine wrote data through to S3:

```bash theme={"theme":{"light":"css-variables","dark":"css-variables"}}
# Forward the gateway Service to localhost:8080 in the background.
kubectl -n firebolt port-forward svc/firebolt-gateway 8080:80 &

# Create a table on the engine.
curl -s http://localhost:8080/ -H "X-Firebolt-Engine: default" \
  -H "Content-Type: text/plain" --data "create table t (val int)"

# Insert one row, which forces the engine to write a tablet.
curl -s http://localhost:8080/ -H "X-Firebolt-Engine: default" \
  -H "Content-Type: text/plain" --data "insert into t values (1)"

# List the bucket. New object-storage prefixes appear as the engine writes data.
aws s3 ls s3://firebolt-managed/
```

New prefixes appear under the bucket as the engine writes data.

## Use an S3-compatible endpoint

For any S3-compatible endpoint reachable from the engine pods, such as self-hosted MinIO, Ceph RGW, or an in-cluster S3 emulator, use `type: minio`. The engine signs requests with the access key and secret key `firebolt` in this mode, so the endpoint must accept those credentials.

```yaml theme={"theme":{"light":"css-variables","dark":"css-variables"}}
customEngineConfig:
  storage:
    type: minio
    api_scheme: "s3://"
    bucket_name: firebolt-managed
    minio:
      endpoint: http://minio.minio.svc.cluster.local:9000
```

`endpoint` must be a URL the engine pod can resolve and reach. Create the bucket out of band before the engine starts.

## Restrict external access with an intermediary role

The bucket you set under `customEngineConfig.storage` holds the engine's managed tablet data, and the engine reaches it with the engine pod's own AWS identity. Queries that read from or write to external locations, such as external tables or `COPY` statements that point at a different bucket, follow a separate credential path.

By default, external access also uses the engine pod's own AWS identity. That identity belongs to this chart release, so it is not a convenient identity for the owner of an external bucket to reference when they grant access.

An intermediary role gives external access a stable identity instead. When you set one, the engine assumes the intermediary role first, and then assumes the external role from there, rather than using its own pod identity. Because the intermediary role ARN is stable and known ahead of time, you can share it with third parties and reference it in S3 bucket policies, IAM role trust policies, and AWS accounts outside your own organization. Access to the object storage bucket always uses the engine pod's own identity, so the intermediary role applies only to external locations.

Create the intermediary IAM role and grant the engine's identity permission to assume it. The intermediary role's trust policy must allow the engine ServiceAccount identity to assume it, and the role needs only `sts:AssumeRole` on the external roles it is allowed to reach.

Set the intermediary role ARN under `customEngineConfig.storage.aws.intermediary_access_role`:

```yaml theme={"theme":{"light":"css-variables","dark":"css-variables"}}
customEngineConfig:
  storage:
    type: s3
    api_scheme: "s3://"
    bucket_name: firebolt-managed
    aws:
      intermediary_access_role: arn:aws:iam::<account-id>:role/firebolt-intermediary
```

The chart passes the `storage.aws` block through unchanged. The block is valid only when `type` is `s3`.

## Storage scope

`customEngineConfig` is global to the release. Multiple engines under the same `engines:` list share the same `customEngineConfig.storage` block, and therefore the same bucket. To run engines against different buckets, install the chart twice in separate releases, each with its own `customEngineConfig.storage`.
