Catalogs
Overview
Catalogs are a searchable inventory of the data assets with their associated metadata. The metadata contains information about tables, partitions, indexes, and more.
Onehouse allows users to connect their data catalogs, which can later be synced with Flows or Metadata Sync service.
Users can connect the following data catalogs (and more coming soon) to their Onehouse projects:
| Catalog Type | AWS projects | GCP projects |
|---|---|---|
| AWS Glue Metastore | ✅ Supported | ❌ Not supported |
| AWS Glue Iceberg REST Catalog (IRC) | ✅ Supported | ❌ Not supported |
| Hive Metastore | ✅ Supported | ✅ Supported |
| DataProc Metastore | ❌ Not supported | ✅ Supported |
| BigQuery + BigLake | ❌ Not supported | ✅ Supported |
| DataHub | ✅ Supported | ✅ Supported |
| Onetable | ✅ Supported | ✅ Supported |
| Databricks Unity Catalog | ✅ Supported | ✅ Supported |
| Snowflake Horizon Catalog | ✅ Supported | ✅ Supported |
| Snowflake Open Catalog (IRC) | ✅ Supported | ✅ Supported |
Add data catalogs
Under the Connections section in the Onehouse nav bar, open the Catalogs page, then click Add New Catalog. From here, you can add a catalog.
Default Catalogs
Default catalogs make it easy to automatically sync new tables to your catalogs. For example, when you create new tables with Onehouse SQL, these tables will automatically sync to your default catalogs.
Functionality
Onehouse projects start with no default catalog. Admins can add default catalogs on the Settings > Project Settings page.
When default catalogs are added in a project, the following functionality will occur:
- Tables created by Flows:
- When creating a new Flow via the Onehouse console, your default catalogs will be pre-filled. You can remove these pre-filled catalogs and add other catalogs while creating the Flow.
- For Flows created via the API, tables will sync only to the catalogs specified explicitly in your API request.
- Tables created with Onehouse SQL: Any Hudi table created through Onehouse SQL will automatically sync to the default catalogs using the MetaSync Table Service.
- External Tables: When you add an External Table in Onehouse, that table will automatically sync to the default catalogs with the MetaSync Table Service.
FAQ
- What happens if I change the default catalogs for a project?
- Changing the default catalogs for a project will not affect existing tables. Changes will only take place when creating new tables.
- What if I don’t want to sync tables to any catalog/metastore?
- Default catalogs are optional. If no default catalog is set, tables will not sync automatically.
- Will Flow tables always sync to the Default Catalogs?
- No, tables created by a Flow sync based on the selected catalogs during creation. Default catalogs will simply pre-fill this field for convenience.
- What if I create a Flow table using the Onehouse API?
- Tables created by a Flow via API will sync to the catalogs specified directly in the API request, not the default catalogs.
- Can I set Default Catalogs at the lake level?
- Not yet. Currently, default catalogs apply only at the project level. Lake-level defaults may be considered in future updates.