Skip to main content

Schema Registry

Overview

A Schema Registry can be added to a Source in Onehouse. When you add a Schema Registry to a Source, schemas from that Schema Registry will be available for all Stream Captures using the Source.

How Schema Registries are used

Schemas from your Schema Registry are used while creating Stream Captures:

  • Some Sources (e.g. Apache Kafka) require a schema for reading the data, and some Sources (e.g. S3) allow you to optionally specify a schema for the incoming data
  • Schemas can be used for Schema Validation in Data Quality Validations for any Source

See a specific Source's docs to understand if a schema is required.

Schema Registry Options

File-Based Schema Registry

With a File-Based Schema Registry, Onehouse reads your schema files from a cloud storage bucket. Schema files should be in the AVRO schema format (see examples) stored as .avsc.

Important: You must grant Onehouse access to the bucket containing your schema files.

Schema file types supported:

  • Avro: Use the file extension .avsc and match the format defined defined here in the Apache Avro spec.
  • XSD: If you are ingesting XML data would like to use XSD schemas, our team can convert these schemas to AVRO format. Please contact Onehouse support.

Proto JAR Schema Registry

You can use a Proto JAR file-based schema registry for Kafka sources with messages in the Proto (Protocol Buffer) format. Follow these steps to set it up:

  1. Protocol Buffer schemas should use the file extension .proto and match the format defined in the proto3 spec.
  2. Create a JAR from the compiled .proto files. This JAR will contain the Java classes generated by your schemas.
  3. In the Onehouse console, navigate to Settings > Integrations > Manage JARs, then upload the JAR.

AWS Glue Schema Registry

Onehouse can read, write to and manage an existing Glue Schema Registry within your AWS account.

Note: You can create a Glue Schema Registry through AWS Console UI or AWS CLI or SDK.

Confluent Schema Registry

Onehouse can read and manage an existing Confluent Schema Registry. You can also use schemas with Schema Contexts.