Schema Registry
Overview
A Schema Registry can be added to a Source in Onehouse. When you add a Schema Registry to a Source, schemas from that Schema Registry will be available for all Stream Captures using the Source.
How Schema Registries are used
Schemas from your Schema Registry are used while creating Stream Captures:
- Some Sources (e.g. Apache Kafka) require a schema for reading the data, and some Sources (e.g. S3) allow you to optionally specify a schema for the incoming data
- Schemas can be used for Schema Validation in Data Quality Validations for any Source
See a specific Source's docs to understand if a schema is required.
Schema Registry Options
File-Based Schema Registry
With a File-Based Schema Registry, Onehouse reads your schema files from a cloud storage bucket. Schema files should be in the AVRO schema format (see examples) stored as .avsc
.
Important: You must grant Onehouse access to the bucket containing your schema files.
Schema file types supported:
- Avro: Use the file extension
.avsc
and match the format defined defined here in the Apache Avro spec. - XSD: If you are ingesting XML data would like to use XSD schemas, our team can convert these schemas to AVRO format. Please contact Onehouse support.
Proto JAR Schema Registry
You can use a Proto JAR file-based schema registry for Kafka sources with messages in the Proto (Protocol Buffer) format. Follow these steps to set it up:
- Protocol Buffer schemas should use the file extension
.proto
and match the format defined in the proto3 spec. - Create a JAR from the compiled
.proto
files. This JAR will contain the Java classes generated by your schemas. - In the Onehouse console, navigate to Settings > Integrations > Manage JARs, then upload the JAR.
AWS Glue Schema Registry
Onehouse can read, write to and manage an existing Glue Schema Registry within your AWS account.
Note: You can create a Glue Schema Registry through AWS Console UI or AWS CLI or SDK.
Confluent Schema Registry
Onehouse can read and manage an existing Confluent Schema Registry. You can also use schemas with Schema Contexts.