Skip to main content

AWS MSK Kafka

Continuously stream data directly from AWS Managed Apache Kafka (MSK) into Onehouse tables.

Click Sources > Add New Source > MSK Kafka. Then, follow the setup guide within the Onehouse console to configure your source.

Cloud Provider Support

  • AWS: ✅ Supported
  • GCP: Not supported

Prerequisites

  • Ensure that you have granted permission for MSK in the Terraform or CloudFormation configurations when you connected your cloud account.

Reading Kafka Messages

Onehouse supports the following serialization types for Kafka message values:

Serialization Type (for message value)Schema RegistryDescription
AvroRequiredDeserializes message value in the Avro format. Send messages using Kafka-Avro specific libraries; vanilla AVRO libraries will not work.
JSONOptionalDeserializes message value in the JSON format.
JSON_SR (JSON Schema)RequiredDeserializes message value in the Confluent JSON Schema format.
ProtobufRequiredDeserializes message value in the Protocol Buffer format.
Byte ArrayN/APasses the raw message value as a Byte Array without performing deserialization. Also adds the message key as a string field.

Onehouse currently does not support reading Kafka message keys for Avro, JSON, JSON_SR, and Protobuf serialized messages.

Usage Notes

  • If a message is compacted or deleted within the Kafka topic, it can no longer be ingested since the payload will be a tombstone/null value.
  • Ensure your MSK Cluster is in the same region as the Onehouse project. This will reduce costs for moving data across regions.

Guide: Configure AWS VPC Peering and Test Connectivity

  1. Create a VPC Peering entry between the requesting VPC (Onehouse) and the accepter VPC (your MSK cluster).

  1. Modify the MSK security group to allow "All Traffic" from the Onehouse EKS security group.

  1. Modify the Route Table for all the EKS VPCs and add a route to the MSK VPC CIDR.

  1. Modify the Route Table for the MSK VPC and add a route to the EKS VPC CIDR.

  1. Create one EC2 instance in each VPC. EC2-1 mimics the connection from Onehouse and is in one of the private subnets in the EKS VPC. The security group(s) should be the same as the ones attached to EKS cluster. EC2-2 mimics where the Kafka instance is deployed and is in one of the private subnets (or public subnet) where you deployed Kafka. The security group(s) should be the same as the ones attached to Kafka.

  2. Create a test path in AWS's Reachability Analyzer between those instance

  1. Analyze Path