Amazon Kinesis (Coming soon)
Continuously stream data from Amazon Kinesis Data Streams into Onehouse tables.
Click Sources > Add New Source > Amazon Kinesis. Then, follow the setup guide within the Onehouse console to configure your source.
Cloud Provider Support
- AWS: ✅ Supported
- GCP: Not supported
Reading Kinesis Records
Onehouse supports the following data formats for Kinesis stream records:
| Data Format | Schema Registry | Description |
|---|---|---|
| JSON | Optional | Deserializes stream records in JSON format. |
| Avro | Required | Deserializes stream records in Avro format. Requires a connected schema registry. |
| Parquet | Optional | Deserializes stream records in Parquet format. |
Schema Registry Support
Kinesis sources support the following schema registries:
- AWS Glue Schema Registry — native integration for AWS-based deployments.
- Confluent Schema Registry — for environments using Confluent for schema management.
- File-Based Schema Registry — for custom or local schema definitions.
When a schema registry is configured, select the schema name for each stream to ensure correct deserialization.
Creating a Flow from a Kinesis Source
Select Streams
After selecting your Kinesis data source, Onehouse automatically discovers available streams. From the stream list, select the streams you want to ingest.
Auto-Capture
Enable auto-capture to continuously monitor and ingest new streams that match a filter pattern. When auto-capture is active, Onehouse periodically scans for new streams and automatically creates flows for any that match the configured regex filter. A common schema, key, and transformation configuration is applied to all auto-captured streams.
Starting Sequence Number
Configure where Onehouse begins reading from each shard:
| Option | Description |
|---|---|
| Latest (default) | Start reading from the most recent sequence number in each shard. New records published after flow creation are ingested. |
| Trim Horizon | Start reading from the oldest available sequence number in each shard, ingesting all retained records from the beginning of the shard's retention period. |
Per-Stream Configuration
For each selected stream, configure:
- Destination Table Name — the name of the Hudi table created in your lakehouse.
- Record Keys — columns that uniquely identify each record.
- Precombine Key — the field used to resolve duplicate records in merge-on-read tables.
- Partition Keys — columns used to partition the destination table.
- Transformations — optional data transformations applied during ingestion (see Transformations).
- Quarantine — optionally route invalid records to a quarantine table for inspection.
Permissions
The following role-based access control (RBAC) actions govern Kinesis flow operations:
| Action | Description |
|---|---|
| Create Stream | Create a new flow from a Kinesis source. |
| Clone Stream | Duplicate an existing Kinesis flow configuration. |
| Edit Stream | Modify the configuration of an existing Kinesis flow. |
Usage Notes
- Kinesis sources are available only for AWS-linked projects.
- Stream discovery requires that the linked cloud account has read access to the Kinesis Data Streams service.
- The starting sequence number configuration applies to all shards within a stream.