Skip to main content

Amazon Kinesis (Coming soon)

Continuously stream data from Amazon Kinesis Data Streams into Onehouse tables.

Click Sources > Add New Source > Amazon Kinesis. Then, follow the setup guide within the Onehouse console to configure your source.

Cloud Provider Support

  • AWS: ✅ Supported
  • GCP: Not supported

Reading Kinesis Records

Onehouse supports the following data formats for Kinesis stream records:

Data FormatSchema RegistryDescription
JSONOptionalDeserializes stream records in JSON format.
AvroRequiredDeserializes stream records in Avro format. Requires a connected schema registry.
ParquetOptionalDeserializes stream records in Parquet format.

Schema Registry Support

Kinesis sources support the following schema registries:

  • AWS Glue Schema Registry — native integration for AWS-based deployments.
  • Confluent Schema Registry — for environments using Confluent for schema management.
  • File-Based Schema Registry — for custom or local schema definitions.

When a schema registry is configured, select the schema name for each stream to ensure correct deserialization.

Creating a Flow from a Kinesis Source

Select Streams

After selecting your Kinesis data source, Onehouse automatically discovers available streams. From the stream list, select the streams you want to ingest.

Auto-Capture

Enable auto-capture to continuously monitor and ingest new streams that match a filter pattern. When auto-capture is active, Onehouse periodically scans for new streams and automatically creates flows for any that match the configured regex filter. A common schema, key, and transformation configuration is applied to all auto-captured streams.

Starting Sequence Number

Configure where Onehouse begins reading from each shard:

OptionDescription
Latest (default)Start reading from the most recent sequence number in each shard. New records published after flow creation are ingested.
Trim HorizonStart reading from the oldest available sequence number in each shard, ingesting all retained records from the beginning of the shard's retention period.

Per-Stream Configuration

For each selected stream, configure:

  • Destination Table Name — the name of the Hudi table created in your lakehouse.
  • Record Keys — columns that uniquely identify each record.
  • Precombine Key — the field used to resolve duplicate records in merge-on-read tables.
  • Partition Keys — columns used to partition the destination table.
  • Transformations — optional data transformations applied during ingestion (see Transformations).
  • Quarantine — optionally route invalid records to a quarantine table for inspection.

Permissions

The following role-based access control (RBAC) actions govern Kinesis flow operations:

ActionDescription
Create StreamCreate a new flow from a Kinesis source.
Clone StreamDuplicate an existing Kinesis flow configuration.
Edit StreamModify the configuration of an existing Kinesis flow.

Usage Notes

  • Kinesis sources are available only for AWS-linked projects.
  • Stream discovery requires that the linked cloud account has read access to the Kinesis Data Streams service.
  • The starting sequence number configuration applies to all shards within a stream.