DataHub
Description
DataHub is an open-source metadata platform that enables users to discover, understand, and manage metadata across a wide range of data sources. It provides a unified data catalog for searching and browsing datasets, schemas, and other metadata, as well as a metadata ingestion framework for integrating data from various sources. With DataHub, organizations can create a single source of truth for their data assets, improving data discoverability and governance.
Setup guide
- Enter a Name to identify the data catalog in Onehouse.
- Select "DataHub" as the Type.
- Use the datahub-gms endpoint for the server URL. If you are using the DataHub Core (docker quickstart), the value is http://localhost:8080.
- Enter the Data Platform Name: hudi. The recommended default value is "hudi".
- Enter the Dataset Env: PROD. Currently only the value of "PROD" is valid. There are plans to support all the values in https://raw.githubusercontent.com/datahub-project/datahub/master/li-utils/src/main/pegasus/com/linkedin/common/FabricType.pdl.
- Enter the Auth Token. If you are using the DataHub Core (docker quickstart), you need to run
export METADATA_SERVICE_AUTH_ENABLED=true
before docker compose startup to enable the feature to generate personal access tokens.
Once the Onehouse Capture stream has run with the DataHub as one of the catalogs to sync with, you can see the data in DataHub by logging into the UI. For DataHub Core (docker quickstart), the URL for the UI is http://localhost:9002.