Kafka Sink Connector

Firebolt Kafka Connect Sink is a Kafka Connect connector that delivers data from Kafka topics to Firebolt tables.

Prerequisites

Apache Kafka 3.2 or later installed in your environment
(Optional) Confluent Cloud account if deploying on Confluent Cloud

Features

Append-only writes with at-least-once delivery semantics
Schema Registry support for Kafka message values
Developed and maintained by Firebolt; verified by Confluent
Supports all Firebolt data types except STRUCT and GEOGRAPHY

Quickstart

Follow this guide to set up Firebolt Kafka Connect Sink on Confluent Cloud.

Firebolt details

To connect to Firebolt you need the following information:

Service account client ID and client secret
Database name — the database that will contain the tables populated from Kafka topics
Engine name — the engine that will run INSERT queries
Account name — the Firebolt account that has access to the database

Kafka details

Topic names — the topics that will be synced to Firebolt tables
Kafka API key and secret — when deployed on Confluent Cloud, used to authenticate to Kafka
Schema Registry API key and secret — if using Schema Registry on Confluent Cloud, used to authenticate to Schema Registry

Firebolt connector configuration

Mandatory attributes

firebolt.clientId — client ID used to authenticate to Firebolt
firebolt.clientSecret — client secret corresponding to the client ID
jdbc.connection.url — JDBC connection URL used to connect to Firebolt. It must include the database name, account name, and engine name.
Do not put the client ID and client secret in the JDBC connection URL; this attribute is not obfuscated when the connector definition is displayed.
topics — comma-delimited list of topics the connector listens to (for example: mytopic1,mytopic2,mytopic3)
value.converter — set to io.confluent.connect.json.JsonSchemaConverter
key.converter — set to org.apache.kafka.connect.storage.StringConverter

Optional attributes

topic.to.table.mapping — if your topic names do not match your table names, use this property to map topics to tables. It is a comma-separated list of topic_name:table_name pairs (for example: mytopic1:mytable1,mytopic2:mytable2).
value.converter.schema.registry.url — URL of your Schema Registry if used for the value schema
value.converter.basic.auth.credentials.source — set to USER_INFO if using API key/secret to communicate with Schema Registry
value.converter.schema.registry.basic.auth.user.info — credentials in the format api_key:api_secret
errors.deadletterqueue.topic.name — dead-letter queue topic for messages that cannot be processed
errors.deadletterqueue.context.headers.enable — set to true to include failure context headers in the dead-letter queue
errors.tolerance — set to all so that Kafka messages that cannot be processed are sent to the dead-letter queue

Install Firebolt connector on Confluent Cloud

In Confluent Cloud, navigate to the target cluster. Select Connectors in the left navigation and search for “Firebolt”.

The connector is verified by Confluent but is not managed by Confluent, so you need to download the archive.

Create a new Custom Connector using the downloaded artifact.

Configure the Firebolt connector.

Connector plugin name — choose a name for your connector
Connector class — com.firebolt.kafka.connect.FireboltSinkConnector (the custom connector class for Firebolt)
Type — select Sink (Firebolt implements the Sink functionality)
Connector archive — select the JAR file you downloaded in step 2
Sensitive properties — Firebolt Connect Sink has two sensitive properties (they are not shown in the UI or via REST):
- firebolt.clientId — client ID used to authenticate to Firebolt
- firebolt.clientSecret — client secret corresponding to the client ID

4.1 Set up the credentials used to connect to the Kafka cluster

4.2 Configure the Firebolt connector definition

Here is sample definition:

{
  "firebolt.clientId": "****************",
  "firebolt.clientSecret": "****************",
  "jdbc.connection.url": "jdbc:firebolt:<your_db_name>?account=<your_account>&engine=<your_engine>",
  "topic.to.table.mapping": "mytopic:mytable",
  "topics": "mytopic",
  "value.converter": "io.confluent.connect.json.JsonSchemaConverter",
  "key.converter": "org.apache.kafka.connect.storage.StringConverter",
  "value.converter.basic.auth.credentials.source": "USER_INFO",
  "value.converter.schema.registry.basic.auth.user.info": "<your api key>",
  "value.converter.schema.registry.url": "<your_schema_registry_url>",
  "errors.deadletterqueue.context.headers.enable": "true",
  "errors.deadletterqueue.topic.name": "<your_deadletterqueue_name>",
  "errors.tolerance": "all",
  "consumer.override.fetch.max.bytes": "20971520",
  "consumer.override.max.partition.fetch.bytes": "10485760",
  "consumer.override.max.poll.records": "6000",
  "fetch.max.bytes": "15000000",
  "max.partition.fetch.bytes": "10000000",
  "poll.interval.ms": "1000",
  "producer.override.max.request.size": "10485760"
}

4.3 Configure the outgoing networking endpoints

4.4 Size your connector workers

4.5 On the last page of the wizard, review all details from the previous steps and complete the workflow.

You should now see the connector running on the Connectors page.

Troubleshoot installing Firebolt connector on Confluent Cloud

Networking endpoints troubleshooting — The Kafka connector needs to know in advance the egress endpoints it will call so it can allowlist those IP addresses. Set endpoints for Firebolt authentication (id.app.firebolt.io) and Firebolt backend API calls (api.app.firebolt.io). Some endpoints are dynamic (the account URL is specific to the account in your JDBC URL). Each endpoint may be served by multiple IPs because a reverse proxy is used in front of the services.

In case you see the status of the connector as failed, then check the Settings page

Go to the networking section, and you should see an error message. Click on Fix.

Then click on the Add to allow-list

Make sure you click then Save Changes on the Networking section and then Apply Changes on the bottom of the Settings page so the changes to be applied.

Log messages with request being too large

Cannot process the firebolt record from partition xxx at offset yyy, as it is too large and exceeds the Firebolt request entity size

Firebolt has a maximum payload size that can be set on an http request. If your Kafka message is, let’s say 20KB, and you are ingesting 1000 messages from the same topic, then your payload size would be 20MB. You have few options to fix this:

reduce the number of messages that you ingest in a kafka batch by setting this property consumer.override.max.poll.records to a smaller value
contact [email protected] to request an increase in maximum payload request size