Skip to main content

Confluent CDC

Use your Confluent cluster to stream CDC data from relational databases into your managed lakehouse.

Simply enter the details for your relational database and Confluent Cloud cluster, then Onehouse will facilitate ingestion by provisioning and managing resources within your Confluent Cloud account.

Cloud Provider Support

  • AWS: ✅ Supported
  • GCP: ✅ Supported

Prerequisites

Prepare your relational database

Confluent cloud should be able to reach the RDBMS instance either through public endpoint/private link. For postgres databases, follow this doc for setting up permissions for replication slot, creating heartbeat table etc.

Prepare your cluster in Confluent

You will need a Confluent Cloud Kafka cluster, along with the following API key secrets:

  1. Confluent Cloud API key/secret
  2. Kafka cluster API key/secret, bootstrap server (broker) info
  3. Schema Registry url, key/secret

Important Usage Notes

All Databases

  • The Confluent CDC source supports Postgres databases. We are working to add support for more databases.
  • Ensure your Confluent Cloud account has proper permissions to access your database.
  • When you connect your database, Onehouse will automatically provision and manage resources in the Confluent cluster you provide to facilitate CDC data ingestion into the lakehouse.
  • When creating a Flow with a CDC source, ensure the Write Mode is set to Mutable if you want to update records in the table using CDC logs. If you prefer to land the raw CDC logs in the table without updating records, you may use Append-only Write Mode to improve write performance.
  • We recommend hosting your database, Confluent Cloud account, and Onehouse project in the same region to avoid transfer costs.

Postgres Databases

  • Important: You must complete the prerequisites from the Postgres CDC source setup guide for Onehouse to properly access and replicate your database.
  • Onehouse will ingest records in the Debezium Postgres CDC format. To use these events for updating records in the table (rather than ingesting the raw events), apply the Convert CDC Data transformation.
  • When creating a Flow with a Confluent CDC source to ingest data from Postgres, Onehouse will initially bootstrap the existing data from Postgres before starting incremental ingestion.
  • We require the source Postgres table to have at least one primary key.

Limitations

  • Postgres DECIMAL and NUMERIC column types are not supported.
  • Avoid using TRUNCATE TABLE in your Postgres database; this may cause the Flow to fail.
  • Changing the source database host is not supported in the product. Create a support ticket for help in changing this.
  • Logs for your Flow may appear empty when the connection to the source database fails. The Onehouse team will be alerted, but you can also create a support ticket.
  • The first time you create a Flow with a Postgres source, Onehouse will deploy infrastructure to capture the CDC events. You might see the Flow in the Provisioning state for up to 1 hour while these resources are provisioning.