Skip to main content

Apache Kafka

Description

Continuously stream data directly from Kafka into your Onehouse managed lakehouse.

Follow the setup guide within the Onehouse console to get started. Click Sources > Add New Source > Apache Kafka.

Reading Kafka Messages

Onehouse supports the following serialization types for Kafka message values:

Message Value Serialization TypeSchema RegistryDescription
AvroRequiredDeserializes message value in the Avro format. Send messages using Kafka-AVRO specific libraries, vanilla AVRO libraries will not work. Read more on this article: Deserialzing Confluent Avro Records in Kafka with Spark
JSONOptionalDeserializes message value in the JSON format.
JSON_SR (JSON Schema)RequiredDeserializes message value in the Confluent JSON Schema format.
ProtobufRequiredDeserializes message value in the Protocol Buffer format.
Byte ArrayN/APasses the raw message value as a Byte Array without performing deserialization. Also adds the message key as a string field.

Onehouse currently does not support reading Kafka message keys for Avro, JSON, JSON_SR, and Protobuf serialized messages.

Steps for creating a Kafka source with Protobuf message type

If you're using protobuf as the message value serialization type, you need to provide the protobuf schema in a .jar file built with the .proto file. Onehouse will use the .jar to deserialize the message value.

Pre-requisites

  • Kafka cluster with required topics
  • Java 8
  • Maven (compatible with the Java version)
  • Protobuf (Protobuf 3.XX.X)

Follow this example to create the schema jar:

  1. Create a schema i.e. sample.proto in the main/java/resources folder
  2. Run the following command to compile the schema
protoc --java_out=./src/main/java ./src/main/resources/sample.proto

Note: This will generate the schema class in the main/java folder

  1. Generate the jar file using the following command
mvn clean package

Note: This will generate the jar file in the target folder

  1. Upload the schema jar to S3 where there is read permissions for Onehouse
  2. Create a new source with Apache Kafka as the source type and provide the jar S3 URI in the Schema Registry section

Usage Notes

  • If a message is compacted or deleted within the Kafka topic, it can no longer be ingested since the payload will be a tombstone/null value.