Apache Kafka
Description
Continuously stream data directly from Kafka into your Onehouse managed lakehouse.
Follow the setup guide within the Onehouse console to get started. Click Sources > Add New Source > Apache Kafka.
Reading Kafka Messages
Onehouse supports the following serialization types for Kafka message values:
Message Value Serialization Type | Schema Registry | Description |
---|---|---|
Avro | Required | Deserializes message value in the Avro format. Send messages using Kafka-AVRO specific libraries, vanilla AVRO libraries will not work. Read more on this article: Deserialzing Confluent Avro Records in Kafka with Spark |
JSON | Optional | Deserializes message value in the JSON format. |
JSON_SR (JSON Schema) | Required | Deserializes message value in the Confluent JSON Schema format. |
Protobuf | Required | Deserializes message value in the Protocol Buffer format. |
Byte Array | N/A | Passes the raw message value as a Byte Array without performing deserialization. Also adds the message key as a string field. |
Onehouse currently does not support reading Kafka message keys for Avro, JSON, JSON_SR, and Protobuf serialized messages.
Steps for creating a Kafka source with Protobuf message type
If you're using protobuf as the message value serialization type, you need to provide the protobuf schema in a .jar
file built with the .proto
file.
Onehouse will use the .jar
to deserialize the message value.
Pre-requisites
- Kafka cluster with required topics
- Java 8
- Maven (compatible with the Java version)
- Protobuf (Protobuf
3.XX.X
)
Follow this example to create the schema jar:
- Create a schema i.e.
sample.proto
in themain/java/resources
folder - Run the following command to compile the schema
protoc --java_out=./src/main/java ./src/main/resources/sample.proto
Note: This will generate the schema class in the
main/java
folder
- Generate the jar file using the following command
mvn clean package
Note: This will generate the jar file in the
target
folder
- Upload the schema jar to S3 where there is read permissions for Onehouse
- Create a new source with
Apache Kafka
as the source type and provide the jar S3 URI in theSchema Registry
section
Usage Notes
- If a message is compacted or deleted within the Kafka topic, it can no longer be ingested since the payload will be a tombstone/null value.