Work with External Tables
SQL Clusters allow you to read and write data with external Apache Hudi tables (via Observed Lakes) and Apache Iceberg tables (via Iceberg REST Catalog) created on other compute platforms such as Amazon EMR.
Register external tables in Onehouse
In order for SQL Clusters to access the external tables, you'll need to register the tables in Onehouse with an Observed Lake. Follow this guide.
Read from External Tables
Query tables
All Onehouse tables in the project will be automatically available in the SQL Cluster's catalog. To query a Oneohuse table, you can reference tables as database.table:
SELECT * FROM database.table;
Tip: Toggle realtime mode
While external engines might read Onehouse tables as views (read-optimized and realtime), Onehouse SQL understands tables by name directly.
By default, queries are read-optimized, retrieving only the compacted data. You can change to the real-time query type for the current session by running:
SET hoodie.query.as.ro.table=false;
Tip: Refresh outdated cache
When writing concurrently with external writers, Spark cache may cause queries to fail or show an outdated state of the table. If this occurs, run the following command to get the latest table state:
REFRESH TABLE <db>.<table>
You can also prevent this by setting the Spark caching config:
spark.sql.filesourceTableRelationCacheSize: "0"
Write to External Tables
Onehouse SQL can write to tables that created outside of Onehouse (such as an external Apache Spark platform).
Onehouse SQL can write concurrently with external writers. To prevent data corruption, you must follow all of the steps below.
- Single-Format Writes: Avoid writing to the table in multiple table formats. All writers should write in the same table format.
- Tip: You can still sync the table to additional table formats for readers by enabling the Metadata Sync table service with a OneTable catalog.
- [For Apache Hudi tables] Lock Provider: Enable a lock provider in your Onehouse project.
- IMPORTANT: The lock provider you enable in Onehouse must also be used with by the external writer with the same configurations.
- [For Apache Hudi tables] Hudi Configurations: Set your external writer configurations to use the same Apache Hudi configurations as Onehouse SQL. In your external writer:
- Enable the Metadata Table with
hoodie.metadata.enable=true. - Enable timestamp ordering with
hoodie.timestamp.ordering.validate.enable=true. - Disable Hadoop with
parquet.avro.write-old-list-structure=false.
- Enable the Metadata Table with