Schema Registry
Overview
A Schema Registry can be added to a Source in Onehouse. When you add a Schema Registry to a Source, schemas from that Schema Registry will be available for all Flows using the Source.
How Schema Registries are used
Schemas from your Schema Registry are used while creating Flows:
- Some Sources (e.g. Apache Kafka) require a schema for reading the data, and some Sources (e.g. Amazon S3) allow you to optionally specify a schema for the incoming data.
- Schemas can be used for Schema Validation in Data Quality Validations for any Source.
See a specific Source's docs to understand if a schema is required.
Schema Registry Options
File-Based Schema Registry
With a File-Based Schema Registry, Onehouse reads your schema files from a cloud storage bucket. Schema files should be in the AVRO schema format (see examples) stored as .avsc.
Important: You must grant Onehouse access to the bucket containing your schema files.
Schema file types supported:
- Avro: Use the file extension
.avscand match the format defined defined here in the Apache Avro spec. - XSD: If you are ingesting XML data would like to use XSD schemas, our team can convert these schemas to AVRO format. Please contact Onehouse support.
Proto JAR Schema Registry
You can use a Proto JAR file-based schema registry for Kafka sources with messages in the Proto (Protocol Buffer) format. Follow these steps to set it up:
- Protocol Buffer schemas should use the file extension
.protoand match the format defined in the proto3 spec. - Create a JAR from the compiled
.protofiles. This JAR will contain the Java classes generated by your schemas. - In the Onehouse console, navigate to Settings > Integrations > Manage JARs, then upload the JAR.
AWS Glue Schema Registry
Onehouse can read, write to and manage an existing Glue Schema Registry within your AWS account.
Note: You can create a Glue Schema Registry through AWS Console UI or AWS CLI or SDK.
Confluent Schema Registry
Onehouse can read and manage an existing Confluent Schema Registry. You can also use schemas with Schema Contexts.