AWS Glue Data Catalog (Iceberg REST Catalog)
AWS Glue Data Catalog is a data catalog service provided by Amazon Web Services. This integration allows you to connect to Glue as an Iceberg REST catalog (IRC) for Spark and SQL Clusters. If you prefer to connect to Glue as a Hive Metastore (HMS), use the Glue HMS integration.
The Glue IRC integration supports the Apache Iceberg v2 specification. As of writing, this is the latest spec supported by AWS Glue.
Cloud Provider Support
- AWS: ✅ Supported
- GCP: Not supported
Functionality
- Glue IRC can be used as the primary catalog in Spark and SQL Clusters.
- Clusters using Glue IRC will automatically install Apache Iceberg version 1.10.0.
Limitations
- Glue IRC currently cannot be used to sync Onehouse tables to Glue via the Metadata Sync table service. Instead, use the Glue Hive Metastore integration for metadata syncs.
- Some operations are not supported by Glue IRC. Refer to the official AWS documentation for supported operations.
- For example,
CREATE TABLE ... AS SELECT ...is not supported.
- For example,
- When creating a table with Glue IRC, you must specify a
LOCATION. The location will not be automatically set, whereas the Onehouse Managed Catalog will set the table location based on the database path. - Catalog events (such as table creation) will not be registered to Onehouse.
- Tables created with IRC cannot run table services.
Setup guide
In the Onehouse console, open Catalogs, then click Add New Catalog. Fill in the following configurations.
- Name: Unique identifier for the data catalog in Onehouse. This does not need to match the Glue catalog name in AWS.
- Type: Select 'Glue Iceberg REST Catalog'.
- Glue Catalog Name: Enter a name that your Clusters will use to identify the catalog.
- In your code and SQL, you can reference tables as
<catalog-name>.<database>.<table>or simply<database.table>.
- In your code and SQL, you can reference tables as
- Region: Specify the AWS region of the Glue catalog.
- Account ID: Enter the AWS account ID of the Glue catalog. This will be pre-filled with the AWS account connected to your Onehouse project.
