Skip to main content

Oracle CDC

Ingest change data capture (CDC) logs from an Oracle database into Onehouse tables.

Follow the setup guide within the Onehouse console to get started. Click Sources > Add New Source > Oracle CDC.

Cloud Provider Support

  • AWS: ✅ Supported
  • GCP: Not supported

Prerequisites

  1. Ensure that you have granted permission for database sources in the Terraform or CloudFormation configurations when you connected your cloud account.
  2. Your Oracle table must have at least one primary key.
  3. An AWS Glue schema registry that you can use. See setup guide here.
  4. The Oracle database must be running in ARCHIVELOG mode. Refer to the Debezium guide for preparing the database.
  5. Supplemental logging must be enabled on the database. At minimum, database-level minimal supplemental logging is required:
    ALTER DATABASE ADD SUPPLEMENTAL LOG DATA;
    Onehouse supports both PRIMARY KEY and ALL column-level supplemental logging — choose based on your use case (ALL provides complete before/after images in CDC events at the cost of higher redo log overhead).
  6. If the Oracle database is a Container Database (CDB), a PDB name must be provided in the source configuration. For non-CDB databases, no PDB name should be set.
  7. A dedicated CDC user with the required privileges. Follow the Debezium guide for creating an Oracle connector user. The user also needs SELECT privilege on V$DATABASE.

Schema Information

The schema of the Oracle table will be used as the source schema. Data types from Oracle will be mapped to their Parquet equivalent.

Usage Notes

  • When creating a Flow with an Oracle CDC source, ensure the Write Mode is set to Mutable if you want to update records in the table using CDC logs. If you prefer to land the raw CDC logs in the table without updating records, you may use Append-only Write Mode to improve write performance.
  • Onehouse will ingest records in the Debezium Oracle CDC format. To use these events for updating records in the table (rather than ingesting the raw events), apply the Convert CDC Data transformation.
  • Onehouse will provision a MSK Kafka cluster in your cloud account when you create your first Flow with an Oracle CDC source. Subsequent Flows using any Oracle CDC source will use the same MSK cluster. The MSK cluster will remain active until you shut it down in your cloud account (Note: Ensure that all Postgres, MySQL & Oracle sources have been deleted before shutting down the MSK cluster to avoid impacting other sources.).
  • When creating a Flow with an Oracle CDC source, Onehouse will initially bootstrap the existing data from Oracle before starting incremental ingestion.

Oracle to Hudi data types mapping

OracleHudi (Spark)
CHAR(20)string
NCHAR(20)string
VARCHAR2(100)string
NVARCHAR2(100)string
NUMBER(9,0)integer
NUMBERdouble
NUMBER(18,0)long
NUMBER(12,2)decimal(12,2)
NUMBER(20,2)decimal(20,2)
INTEGERdecimal(38,0)
SMALLINTdecimal(38,0)
DECIMAL(10,2)decimal(10,2)
NUMERIC(10,2)decimal(10,2)
NUMERIC(9)integer
NUMERIC(20,2)decimal(20,2)
FLOATdouble
DOUBLE PRECISIONdouble
REALdouble
BINARY_FLOATfloat
BINARY_DOUBLEdouble
DATEtimestamp
TIMESTAMPtimestamp
TIMESTAMP(3)timestamp
TIMESTAMP(6)timestamp
TIMESTAMP(9)timestamp
TIMESTAMP WITH TZstring
TIMESTAMP WITH LOCAL TZstring

Limitations

  • Changing the source database host is not supported in the product. Create a support ticket for help in changing this.
  • Logs for your Flow may appear empty when the connection to the source database fails. The Onehouse team will be alerted, but you can also create a support ticket.
  • The first time you create a Flow with an Oracle source, Onehouse will deploy infrastructure to capture the CDC events. You might see the Flow in the Provisioning state for up to 1 hour while these resources are provisioning.
  • Spark does not support TimestampType with nanosecond precision. Oracle TIMESTAMP(9) values will be truncated/rounded to microsecond precision during ingestion.

Unsupported schema evolution

Although Oracle supports the following column type changes, they are not supported in Onehouse:

  • Widening NUMBER precision with the same scale (e.g., NUMBER(10,2)NUMBER(14,2)).
  • NUMBER(p,0) where p ≤ 18NUMBER(p,0) where p ≥ 19.
  • NUMBER(p,0) where p ≤ 9NUMBER(p,0) where p ≥ 19.
  • NUMBER(p,0)FLOAT.
  • NUMBER(p,0)BINARY_DOUBLE.

Guide: Set up an AWS Glue Schema Registry

If you do not currently have an AWS Glue schema registry you can use, you can use these steps to create a schema registry.

  1. Go to the AWS Glue console
  2. Select Schema registries under Data catalog in the navigation pane
  3. Select Add registry
  4. Enter a name for the registry, such as "MyDemoSchemaReg"
  5. Enter an optional description for the registry
  6. Select Add registry

More information about AWS Glue Schema Registry can be found on the AWS Glue Documentation pages

Guide: Set up Oracle

Refer to the Debezium Oracle connector setup guide for instructions on configuring your Oracle database for CDC.

Heartbeat - Important

Onehouse periodically updates a heartbeat table named onehouse_heartbeat to keep the CDC stream alive and detect lag. Logged in as the CDC user, create the heartbeat table in the CDC user's schema:

CREATE TABLE onehouse_heartbeat (
id NUMBER DEFAULT 1 PRIMARY KEY,
updated_at TIMESTAMP
);

See the Debezium heartbeat documentation for more details.