Skip to main content

Oracle CDC

Ingest change data capture (CDC) logs from an Oracle database into Onehouse tables.

Follow the setup guide within the Onehouse console to get started. Click Sources > Add New Source > Oracle CDC.

Cloud Provider Support

  • AWS: ✅ Supported
  • GCP: Not supported

Prerequisites

  1. Ensure that you have granted permission for database sources in the Terraform or CloudFormation configurations when you connected your cloud account.
  2. Your Oracle table must have at least one primary key.
  3. An AWS Glue schema registry that you can use. See setup guide here.
  4. The Oracle database must be running in ARCHIVELOG mode. Refer to the Debezium guide for preparing the database.
  5. Supplemental logging must be enabled on the database. See Supplemental logging for details.
  6. If the Oracle database is a Container Database (CDB), a PDB name must be provided in the source configuration. For non-CDB databases, no PDB name should be set.
  7. A dedicated CDC user with the required privileges. Follow the Debezium guide for creating an Oracle connector user. The user also needs SELECT privilege on V$DATABASE.

Schema Information

The schema of the Oracle table will be used as the source schema. Data types from Oracle will be mapped to their Parquet equivalent.

Usage Notes

  • When creating a Flow with an Oracle CDC source, ensure the Write Mode is set to Mutable if you want to update records in the table using CDC logs. If you prefer to land the raw CDC logs in the table without updating records, you may use Append-only Write Mode to improve write performance.
  • Onehouse will ingest records in the Debezium Oracle CDC format. To use these events for updating records in the table (rather than ingesting the raw events), apply the Convert CDC Data transformation.
  • Onehouse will provision a MSK Kafka cluster in your cloud account when you create your first Flow with an Oracle CDC source. Subsequent Flows using any Oracle CDC source will use the same MSK cluster. The MSK cluster will remain active until you shut it down in your cloud account (Note: Ensure that all Postgres, MySQL & Oracle sources have been deleted before shutting down the MSK cluster to avoid impacting other sources.).
  • When creating a Flow with an Oracle CDC source, Onehouse will initially bootstrap the existing data from Oracle before starting incremental ingestion.

Oracle to Hudi data types mapping

OracleHudi (Spark)
CHAR(20)string
NCHAR(20)string
VARCHAR2(100)string
NVARCHAR2(100)string
NUMBER(9,0)integer
NUMBERdouble
NUMBER(18,0)long
NUMBER(12,2)decimal(12,2)
NUMBER(20,2)decimal(20,2)
INTEGERdecimal(38,0)
SMALLINTdecimal(38,0)
DECIMAL(10,2)decimal(10,2)
NUMERIC(10,2)decimal(10,2)
NUMERIC(9)integer
NUMERIC(20,2)decimal(20,2)
FLOATdouble
DOUBLE PRECISIONdouble
REALdouble
BINARY_FLOATfloat
BINARY_DOUBLEdouble
DATEtimestamp
TIMESTAMPtimestamp
TIMESTAMP(3)timestamp
TIMESTAMP(6)timestamp
TIMESTAMP(9)timestamp
TIMESTAMP WITH TZstring
TIMESTAMP WITH LOCAL TZstring

Limitations

  • Changing the source database host is not supported in the product. Create a support ticket for help in changing this.
  • Logs for your Flow may appear empty when the connection to the source database fails. The Onehouse team will be alerted, but you can also create a support ticket.
  • The first time you create a Flow with an Oracle source, Onehouse will deploy infrastructure to capture the CDC events. You might see the Flow in the Provisioning state for up to 1 hour while these resources are provisioning.
  • Spark does not support TimestampType with nanosecond precision. Oracle TIMESTAMP(9) values will be truncated/rounded to microsecond precision during ingestion.

Unsupported schema evolution

Although Oracle supports the following column type changes, they are not supported in Onehouse:

  • Widening NUMBER precision with the same scale (e.g., NUMBER(10,2)NUMBER(14,2)).
  • NUMBER(p,0) where p ≤ 18NUMBER(p,0) where p ≥ 19.
  • NUMBER(p,0) where p ≤ 9NUMBER(p,0) where p ≥ 19.
  • NUMBER(p,0)FLOAT.
  • NUMBER(p,0)BINARY_DOUBLE.

Supported DML Operations

The Onehouse Oracle CDC Ingestion supports the following DML Operations

DMLSupportExample
INSERT✅ SupportedINSERT INTO orders (id, status) VALUES (1, 'new')
UPDATE✅ SupportedUPDATE orders SET status = 'shipped' WHERE id = 1
DELETE✅ SupportedDELETE FROM orders WHERE id = 1
TRUNCATE❌ Not SupportedTRUNCATE TABLE orders

Guide: Set up an AWS Glue Schema Registry

If you do not currently have an AWS Glue schema registry you can use, you can use these steps to create a schema registry.

  1. Go to the AWS Glue console
  2. Select Schema registries under Data catalog in the navigation pane
  3. Select Add registry
  4. Enter a name for the registry, such as "MyDemoSchemaReg"
  5. Enter an optional description for the registry
  6. Select Add registry

More information about AWS Glue Schema Registry can be found on the AWS Glue Documentation pages

Guide: Set up Oracle

Refer to the Debezium Oracle connector setup guide for instructions on configuring your Oracle database for CDC.

Supplemental logging

At minimum, database-level minimal supplemental logging is required:

ALTER DATABASE ADD SUPPLEMENTAL LOG DATA;

For CDC ingestion to work correctly, any column used as a record key or partition key in the Hudi target table must be present in every update event. By default, Oracle redo logs only contain changed columns, so additional supplemental logging is required to guarantee these columns are always logged. Choose one of the following:

Option 1: Database-level PRIMARY KEY supplemental logging. Applies to all current and future tables; no per-table setup required as long as the source table's primary key is used as the Hudi record key and no partition key is configured.

ALTER DATABASE ADD SUPPLEMENTAL LOG DATA (PRIMARY KEY) COLUMNS;

For each ingested table whose Hudi target uses a non-PK column as the record key, or any partition key, add a table-level log group with ALWAYS covering those columns:

ALTER TABLE <table> ADD SUPPLEMENTAL LOG GROUP <group_name> (<record_key_or_partition_key_columns>) ALWAYS;

ALWAYS ensures the columns are logged on every update, even when they aren't modified. CONDITIONAL mode will not work because it only logs columns when at least one column in the group changes.

Option 2: Database-level ALL columns supplemental logging. Applies to all tables and covers any record key or partition key automatically — no table-level configuration needed.

ALTER DATABASE ADD SUPPLEMENTAL LOG DATA (ALL) COLUMNS;

Downside: every column is logged on every update, which significantly increases redo log volume. This can have a noticeable impact on storage, archive log retention, and database write performance.

Option 3: Table-level supplemental logging only. Configure each ingested table individually instead of enabling additional DB-level supplemental logging.

If the source table's primary key is used as the Hudi record key and no partition key is configured, table-level PK supplemental logging is sufficient:

ALTER TABLE <table> ADD SUPPLEMENTAL LOG DATA (PRIMARY KEY) COLUMNS;

Otherwise, add a log group with ALWAYS containing the source columns used as the Hudi record key and partition key:

ALTER TABLE <table> ADD SUPPLEMENTAL LOG GROUP <group_name> (<record_key_columns>, <partition_key_columns>) ALWAYS;

Examples:

  • Record key = source PK id, partition key = country → log group (id, country) ALWAYS.
  • Record key = non-PK column external_id, no partition key → log group (external_id) ALWAYS.
  • Record key = non-PK column external_id, partition key = country → log group (external_id, country) ALWAYS.

Downside: must be configured per table — any new table added later will not be ingested correctly until its supplemental logging is configured.

Archive Log Retention

Onehouse requires retention enabled for Oracle archive logs to allow for any disconnection events between Onehouse and the Oracle cluster.

EnvironmentMinimum Archive Log Retention
Production12 hours
Development/Staging4 hours

Ensure that archive logs are retained for at least the specified duration depending on the environment of the ingested tables.

Heartbeat - Important

Onehouse periodically updates a heartbeat table named onehouse_heartbeat to keep the CDC stream alive and detect lag. Logged in as the CDC user, create the heartbeat table in the CDC user's schema:

CREATE TABLE onehouse_heartbeat (
id NUMBER DEFAULT 1 PRIMARY KEY,
updated_at TIMESTAMP
);

See the Debezium heartbeat documentation for more details.