Skip to main content

Databricks Unity Catalog

Description

Databrick's Unity Catalog provides a unified governance solution for managing and securing data assets across cloud environments. With Onehouse's OneTable support, users can ingest and store data in Delta Lake format, and these tables automatically sync to Unity Catalog. This ensures seamless management of metadata, permissions, and lineage in a centralized, secure catalog.

Setup guide

  1. Create a workspace (if not already created) in Databricks.
  • Refer to the Databricks docs for workspace creation.
  • You will also need to enable Unity Catalog in the workspace. The Databricks documentation mention that the use of a metastore is optional. For Onehouse, this metastore configuration is a requirement for Unity catalog sync to work. See the Databricks documentation for Managing Unity Catalog along with Create a Unity Catalog for details on how to setup the Unity Catalog and Databricks Metastore for your workspace.

Note: Databricks requires Delta Lake table metadata to work with Onehouse tables. To meet this requirement, be sure you have created a OneTable catalog with Delta Lake enabled, have have attached it to the relevant Stream Capture.

  1. To sync your Onehouse tables with Databricks Unity Catalog, you must grant Databricks access to the storage location where your Onehouse tables are stored. This involves creating a Databricks Storage Credential and External Location.
  1. Create the Databricks Unity Catalog metadata catalog in Onehouse.
  2. Attach the Onehouse Databricks catalog to one or more Stream Captures.

Parameters to be passed while creating a Databricks Unity Catalog in Onehouse

  1. Enter a Name to identify the data catalog in Onehouse
  2. Enter the catalog name in Databricks Unity Catalog where the tables will be synced. If the catalog doesn't already exist, it will be created.
  3. Enter the Databricks compute resource’s Server Hostname and HTTP Path value. You can find these values in the Databricks explorer by navigating to SQL -> SQL Warehouses -> (choose your warehouse) -> Connection details.
  4. Select the Auth Type.
  • For Access Token Auth Type, Enter the personal-access-token for your workspace user. Refer to the Databricks docs for personal access token generation.
  • For OAuth type, Enter the service principal credentials client-id and client-secret from your workspace. Refer to the Databricks docs for service principal creation and to retrieve it's credentials

Example scenario

A Onehouse Table with Table Name=orders_by_product in Onehouse Database with Database Name=orders will get synced to Databricks unity catalog in given input catalog as Schema=orders and Table=orders_by_product. The schema named orders will be created in the Databricks unity catalog if it doesn't already exist.

Role permissions

The following permissions are required for the user account used for connection: Refer to the Databricks docs for managing privileges in Unity Catalog.

GRANT CREATE CATALOG ON METASTORE TO `user_name`;
GRANT ALL PRIVILEGES ON CATALOG `catalog-name` TO `user_name`; --> catalog-name is the input catalog name in Databricks Unity Catalog (if not already created, it will be created)