Amazon Kinesis (Coming soon)

Continuously stream data from Amazon Kinesis Data Streams into Onehouse tables.

Click Sources > Add New Source > Amazon Kinesis. Then, follow the setup guide within the Onehouse console to configure your source.

Cloud Provider Support

AWS: ✅ Supported
GCP: Not supported

Reading Kinesis Records

Onehouse supports the following data formats for Kinesis stream records:

Data Format	Schema Registry	Description
JSON	Optional	Deserializes stream records in JSON format.
Avro	Required	Deserializes stream records in Avro format. Requires a connected schema registry.
Parquet	Optional	Deserializes stream records in Parquet format.

Schema Registry Support

Kinesis sources support the following schema registries:

AWS Glue Schema Registry — native integration for AWS-based deployments.
Confluent Schema Registry — for environments using Confluent for schema management.
File-Based Schema Registry — for custom or local schema definitions.

When a schema registry is configured, select the schema name for each stream to ensure correct deserialization.

Creating a Flow from a Kinesis Source

Select Streams

After selecting your Kinesis data source, Onehouse automatically discovers available streams. From the stream list, select the streams you want to ingest.

Auto-Capture

Enable auto-capture to continuously monitor and ingest new streams that match a filter pattern. When auto-capture is active, Onehouse periodically scans for new streams and automatically creates flows for any that match the configured regex filter. A common schema, key, and transformation configuration is applied to all auto-captured streams.

Starting Sequence Number

Configure where Onehouse begins reading from each shard:

Option	Description
Latest (default)	Start reading from the most recent sequence number in each shard. New records published after flow creation are ingested.
Trim Horizon	Start reading from the oldest available sequence number in each shard, ingesting all retained records from the beginning of the shard's retention period.

Per-Stream Configuration

For each selected stream, configure:

Destination Table Name — the name of the Hudi table created in your lakehouse.
Record Keys — columns that uniquely identify each record.
Precombine Key — the field used to resolve duplicate records in merge-on-read tables.
Partition Keys — columns used to partition the destination table.
Transformations — optional data transformations applied during ingestion (see Transformations).
Quarantine — optionally route invalid records to a quarantine table for inspection.

Permissions

The following role-based access control (RBAC) actions govern Kinesis flow operations:

Action	Description
Create Stream	Create a new flow from a Kinesis source.
Clone Stream	Duplicate an existing Kinesis flow configuration.
Edit Stream	Modify the configuration of an existing Kinesis flow.

Usage Notes

Kinesis sources are available only for AWS-linked projects.
Stream discovery requires that the linked cloud account has read access to the Kinesis Data Streams service.
The starting sequence number configuration applies to all shards within a stream.

Cloud Provider Support​

Reading Kinesis Records​

Schema Registry Support​

Creating a Flow from a Kinesis Source​

Select Streams​

Auto-Capture​

Starting Sequence Number​

Per-Stream Configuration​

Permissions​

Usage Notes​