Troubleshooting
Helpful tips for troubleshooting common issues you may encounter while working with Onehouse Jobs.
General Issues
Database creation error
# Example
Database creation is not allowed through Onehouse SQL. Please use the Onehouse Console or API to create databases.
Jobs currently do not create databases in your catalog.
- If using the Onehouse catalog, you should use the
CREATE DATABASEAPI command or create the database in the Onehouse console. - If using an external catalog, you should create the database directly in the external catalog.
Apache Iceberg Issues
Apache Iceberg not properly installed
# Data source issue example
Caused by: org.apache.spark.SparkClassNotFoundException: [DATA_SOURCE_NOT_FOUND] Failed to find the data source: iceberg.
# Class access issue example
java.lang.IllegalAccessError: class org.apache.iceberg.SparkDistributedDataScan cannot access its abstract superclass
These errors indicate Apache Iceberg is not installed and set up properly on your Cluster.
When you set your Cluster to use an external IRC, such as Glue IRC or Snowflake Open Catalog, Apache Iceberg will be pre-installed and pre-configured.
If you don't set the Cluster to use an external IRC, you should install Apache Iceberg as a dependency and add in your Job configuration:
spark.sql.extensions=org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions
In your code, you must also specify the catalog name (i.e. run USE catalog.db instead of USE db) before accessing data.
Incorrect catalog setup
# Warehouse path issue example
Cannot initialize HadoopCatalog because warehousePath must not be null or empty
The Iceberg REST Catalog (IRC) is not properly configured.
When you set your Cluster to use an external IRC, such as Glue IRC or Snowflake Open Catalog, the IRC configurations will be set up automatically.
If you don't set the Cluster to use an external IRC, you must explicitly provide the spark.sql.catalog.hadoop_catalog.warehouse Spark configuration when creating database.
Glue naming requirements
# Naming issue example
org.apache.iceberg.exceptions.ValidationException: Cannot convert namespace performance-benchmark-iceberg-10tb to Glue database name, because it must be 1-252 chars of lowercase letters, numbers, underscore
When using Glue IRC as your catalog, you cannot include special characters or dashes in the database name.