Oracle CDC
Ingest change data capture (CDC) logs from an Oracle database into Onehouse tables.
Follow the setup guide within the Onehouse console to get started. Click Sources > Add New Source > Oracle CDC.
Cloud Provider Support
- AWS: ✅ Supported
- GCP: Not supported
Prerequisites
- Ensure that you have granted permission for database sources in the Terraform or CloudFormation configurations when you connected your cloud account.
- Your Oracle table must have at least one primary key.
- An AWS Glue schema registry that you can use. See setup guide here.
- The Oracle database must be running in
ARCHIVELOGmode. Refer to the Debezium guide for preparing the database. - Supplemental logging must be enabled on the database. See Supplemental logging for details.
- If the Oracle database is a Container Database (CDB), a PDB name must be provided in the source configuration. For non-CDB databases, no PDB name should be set.
- A dedicated CDC user with the required privileges. Follow the Debezium guide for creating an Oracle connector user. The user also needs
SELECTprivilege onV$DATABASE.
Schema Information
The schema of the Oracle table will be used as the source schema. Data types from Oracle will be mapped to their Parquet equivalent.
Usage Notes
- When creating a Flow with an Oracle CDC source, ensure the Write Mode is set to Mutable if you want to update records in the table using CDC logs. If you prefer to land the raw CDC logs in the table without updating records, you may use Append-only Write Mode to improve write performance.
- Onehouse will ingest records in the Debezium Oracle CDC format. To use these events for updating records in the table (rather than ingesting the raw events), apply the Convert CDC Data transformation.
- Onehouse will provision a MSK Kafka cluster in your cloud account when you create your first Flow with an Oracle CDC source. Subsequent Flows using any Oracle CDC source will use the same MSK cluster. The MSK cluster will remain active until you shut it down in your cloud account (Note: Ensure that all Postgres, MySQL & Oracle sources have been deleted before shutting down the MSK cluster to avoid impacting other sources.).
- When creating a Flow with an Oracle CDC source, Onehouse will initially bootstrap the existing data from Oracle before starting incremental ingestion.
Oracle to Hudi data types mapping
| Oracle | Hudi (Spark) |
|---|---|
| CHAR(20) | string |
| NCHAR(20) | string |
| VARCHAR2(100) | string |
| NVARCHAR2(100) | string |
| NUMBER(9,0) | integer |
| NUMBER | double |
| NUMBER(18,0) | long |
| NUMBER(12,2) | decimal(12,2) |
| NUMBER(20,2) | decimal(20,2) |
| INTEGER | decimal(38,0) |
| SMALLINT | decimal(38,0) |
| DECIMAL(10,2) | decimal(10,2) |
| NUMERIC(10,2) | decimal(10,2) |
| NUMERIC(9) | integer |
| NUMERIC(20,2) | decimal(20,2) |
| FLOAT | double |
| DOUBLE PRECISION | double |
| REAL | double |
| BINARY_FLOAT | float |
| BINARY_DOUBLE | double |
| DATE | timestamp |
| TIMESTAMP | timestamp |
| TIMESTAMP(3) | timestamp |
| TIMESTAMP(6) | timestamp |
| TIMESTAMP(9) | timestamp |
| TIMESTAMP WITH TZ | string |
| TIMESTAMP WITH LOCAL TZ | string |
Limitations
- Changing the source database host is not supported in the product. Create a support ticket for help in changing this.
- Logs for your Flow may appear empty when the connection to the source database fails. The Onehouse team will be alerted, but you can also create a support ticket.
- The first time you create a Flow with an Oracle source, Onehouse will deploy infrastructure to capture the CDC events. You might see the Flow in the
Provisioningstate for up to 1 hour while these resources are provisioning. - Spark does not support
TimestampTypewith nanosecond precision. OracleTIMESTAMP(9)values will be truncated/rounded to microsecond precision during ingestion.
Unsupported schema evolution
Although Oracle supports the following column type changes, they are not supported in Onehouse:
- Widening
NUMBERprecision with the same scale (e.g.,NUMBER(10,2)→NUMBER(14,2)). NUMBER(p,0)wherep ≤ 18→NUMBER(p,0)wherep ≥ 19.NUMBER(p,0)wherep ≤ 9→NUMBER(p,0)wherep ≥ 19.NUMBER(p,0)→FLOAT.NUMBER(p,0)→BINARY_DOUBLE.
Supported DML Operations
The Onehouse Oracle CDC Ingestion supports the following DML Operations
| DML | Support | Example |
|---|---|---|
| INSERT | ✅ Supported | INSERT INTO orders (id, status) VALUES (1, 'new') |
| UPDATE | ✅ Supported | UPDATE orders SET status = 'shipped' WHERE id = 1 |
| DELETE | ✅ Supported | DELETE FROM orders WHERE id = 1 |
| TRUNCATE | ❌ Not Supported | TRUNCATE TABLE orders |
Guide: Set up an AWS Glue Schema Registry
If you do not currently have an AWS Glue schema registry you can use, you can use these steps to create a schema registry.
- Go to the AWS Glue console
- Select Schema registries under Data catalog in the navigation pane
- Select Add registry
- Enter a name for the registry, such as "MyDemoSchemaReg"
- Enter an optional description for the registry
- Select Add registry
More information about AWS Glue Schema Registry can be found on the AWS Glue Documentation pages
Guide: Set up Oracle
Refer to the Debezium Oracle connector setup guide for instructions on configuring your Oracle database for CDC.
Supplemental logging
At minimum, database-level minimal supplemental logging is required:
ALTER DATABASE ADD SUPPLEMENTAL LOG DATA;
For CDC ingestion to work correctly, any column used as a record key or partition key in the Hudi target table must be present in every update event. By default, Oracle redo logs only contain changed columns, so additional supplemental logging is required to guarantee these columns are always logged. Choose one of the following:
Option 1: Database-level PRIMARY KEY supplemental logging. Applies to all current and future tables; no per-table setup required as long as the source table's primary key is used as the Hudi record key and no partition key is configured.
ALTER DATABASE ADD SUPPLEMENTAL LOG DATA (PRIMARY KEY) COLUMNS;
For each ingested table whose Hudi target uses a non-PK column as the record key, or any partition key, add a table-level log group with ALWAYS covering those columns:
ALTER TABLE <table> ADD SUPPLEMENTAL LOG GROUP <group_name> (<record_key_or_partition_key_columns>) ALWAYS;
ALWAYS ensures the columns are logged on every update, even when they aren't modified. CONDITIONAL mode will not work because it only logs columns when at least one column in the group changes.
Option 2: Database-level ALL columns supplemental logging. Applies to all tables and covers any record key or partition key automatically — no table-level configuration needed.
ALTER DATABASE ADD SUPPLEMENTAL LOG DATA (ALL) COLUMNS;
Downside: every column is logged on every update, which significantly increases redo log volume. This can have a noticeable impact on storage, archive log retention, and database write performance.
Option 3: Table-level supplemental logging only. Configure each ingested table individually instead of enabling additional DB-level supplemental logging.
If the source table's primary key is used as the Hudi record key and no partition key is configured, table-level PK supplemental logging is sufficient:
ALTER TABLE <table> ADD SUPPLEMENTAL LOG DATA (PRIMARY KEY) COLUMNS;
Otherwise, add a log group with ALWAYS containing the source columns used as the Hudi record key and partition key:
ALTER TABLE <table> ADD SUPPLEMENTAL LOG GROUP <group_name> (<record_key_columns>, <partition_key_columns>) ALWAYS;
Examples:
- Record key = source PK
id, partition key =country→ log group(id, country) ALWAYS. - Record key = non-PK column
external_id, no partition key → log group(external_id) ALWAYS. - Record key = non-PK column
external_id, partition key =country→ log group(external_id, country) ALWAYS.
Downside: must be configured per table — any new table added later will not be ingested correctly until its supplemental logging is configured.
Archive Log Retention
Onehouse requires retention enabled for Oracle archive logs to allow for any disconnection events between Onehouse and the Oracle cluster.
| Environment | Minimum Archive Log Retention |
|---|---|
| Production | 12 hours |
| Development/Staging | 4 hours |
Ensure that archive logs are retained for at least the specified duration depending on the environment of the ingested tables.
Heartbeat - Important
Onehouse periodically updates a heartbeat table named onehouse_heartbeat to keep the CDC stream alive and detect lag. Logged in as the CDC user, create the heartbeat table in the CDC user's schema:
CREATE TABLE onehouse_heartbeat (
id NUMBER DEFAULT 1 PRIMARY KEY,
updated_at TIMESTAMP
);
See the Debezium heartbeat documentation for more details.