Skip to main content

Confluent CDC

Description

Use your Confluent cluster to stream CDC data from relational databases into your managed lakehouse.

Simply enter the details for your relational database and Confluent Cloud cluster, then Onehouse will facilitate ingestion by provisioning and managing resources within your Confluent Cloud account.

Prerequisites

Prepare your relational database

Confluent cloud should be able to reach the RDBMS instance either through public endpoint/private link. For postgres databases, follow this doc for setting up permissions for replication slot, creating heartbeat table etc.

Prepare your cluster in Confluent

You will need a Confluent Cloud Kafka cluster, along with the following API key secrets:

  1. Confluent Cloud API key/secret
  2. Kafka cluster API key/secret, bootstrap server (broker) info
  3. Schema Registry url, key/secret

Important Usage Notes

All Databases

  • The Confluent CDC source supports Postgres databases. We are working to add support for more databases.
  • Ensure your Confluent Cloud account has proper permissions to access your database.
  • When you connect your database, Onehouse will automatically provision and manage resources in the Confluent cluster you provide to facilitate CDC data ingestion into the lakehouse.
  • When creating a Stream Capture with a CDC source, ensure the Write Mode is set to Mutable if you want to update records in the table using CDC logs. If you prefer to land the raw CDC logs in the table without updating records, you may use Append-only Write Mode to improve write performance.
  • We recommend hosting your database, Confluent Cloud account, and Onehouse project in the same region to avoid transfer costs.

Postgres Databases

  • Important: You must complete the prerequisites from the Postgres CDC source setup guide for Onehouse to properly access and replicate your database.
  • Onehouse will ingest records in the Debezium Postgres CDC format. To use these events for updating records in the table (rather than ingesting the raw events), apply the Convert CDC Data transformation.
  • When creating a Stream Capture with a Confluent CDC source to ingest data from Postgres, Onehouse will initially bootstrap the existing data from Postgres before starting incremental ingestion.
  • We require the source Postgres table to have at least one primary key.