Skip to main content

Apache Kafka

Continuously stream data from Apache Kafka into Onehouse tables.

Click Sources > Add New Source > Apache Kafka. Then, follow the setup guide within the Onehouse console to configure your source.

Cloud Provider Support

  • AWS: ✅ Supported
  • GCP: ✅ Supported

Reading Kafka Messages

Onehouse supports the following serialization types for Kafka message values:

Serialization Type (for message value)Schema RegistryDescription
AvroRequiredDeserializes message value in the Avro format. Send messages using Kafka-Avro specific libraries; vanilla AVRO libraries will not work.
JSONOptionalDeserializes message value in the JSON format.
JSON_SR (JSON Schema)RequiredDeserializes message value in the Confluent JSON Schema format.
ProtobufRequiredDeserializes message value in the Protocol Buffer format.
Byte ArrayN/APasses the raw message value as a Byte Array without performing deserialization. Also adds the message key as a string field.

Onehouse currently does not support reading Kafka message keys for Avro, JSON, JSON_SR, and Protobuf serialized messages.

Usage Notes

  • If a message is compacted or deleted within the Apache Kafka topic, it can no longer be ingested since the payload will be a tombstone/null value.

Guide: Create a Kafka source with Protobuf messages

If you're using protobuf as the message value serialization type, you need to provide the protobuf schema in a .jar file built with the .proto file. Onehouse will use the .jar to deserialize the message value.

Prerequisites

  • Kafka cluster with required topics
  • Java 8
  • Maven (compatible with the Java version)
  • Protobuf (Protobuf 3.XX.X)

Create and upload the schema JAR

  1. Create a schema i.e. sample.proto in the main/java/resources folder
  2. Run the following command to compile the schema
    protoc --java_out=./src/main/java ./src/main/resources/sample.proto

    This will generate the schema class in the main/java folder.

  3. Generate the jar file using the following command
    mvn clean package

    This will generate the jar file in the target folder.

  4. Upload the schema JAR to S3 to an object storage bucket that Onehouse can access.
  5. Create a new source with Apache Kafka as the source type and provide the JAR S3 URI in the Schema Registry section.