Work with Onehouse Tables
SQL Clusters work seamlessly with Onehouse table for reading and writing data.
Read from Onehouse Tables with SQL
Query tables
All Onehouse tables in the project will be automatically available in the SQL Cluster's catalog. To query a Oneohuse table, you can reference tables as database.table:
SELECT * FROM database.table;
Tip: Toggle realtime mode
While external engines might read Onehouse tables as views (read-optimized and realtime), Onehouse SQL understands tables by name directly.
By default, queries are read-optimized, retrieving only the compacted data. You can change to the real-time query type for the current session by running:
SET hoodie.query.as.ro.table=false;
Create Onehouse Tables with SQL
Requirements
Follow these requirements while creating the table to ensure compatibility with the Onehouse platform.
- Create the table as an Apache Hudi table. Instructions are available in this quickstart guide.
- Create the table under the path of a Onehouse Lake. This ensures the table will be registered in Onehouse and will be visible on the Data page.
- Exclude the
LOCATIONparameter to automatically create the table under the storage path of the table's database.
- Exclude the
Table services
Onehouse tables created by SQL will automatically have the following table services enabled:
- Cleaning: All tables will have Cleaning enabled to remove old data that's no longer used in the active table.
- Compaction: Merge on Read tables will have Compaction enabled to recurringly merge log files.
You can view the table services for any table in the Onehouse console. Navigate to the Data page, open your table, then click into the Resources tab.
Sync to external catalogs
You can sync Onehouse tables created by SQL to external catalogs, such as AWS Glue or Snowflake, with the Metadata Sync table service. Tables will automatically be synced to the project's Default Catalogs.
To set up additional catalog syncs for the table:
- Ensure your table was properly in the Onehouse console. Navigate to the Data page and find the table.
- Open the table, then click into the Resources tab. Click Add Table Service.
- Choose the 'Metadata Sync' table service, and select any external catalogs where you'd like to sync the table.
Write to Existing Onehouse Tables with SQL
You can use a SQL Cluster to perform DML and DDL commands on existing Onehouse tables in your Lakes. See examples in this quickstart guide.
Follow these guidelines to ensure smooth operations:
- If you want to run DML or DDL commands on a partitioned table, you must first set the table properties in the SQL session:
ALTER TABLE <table> SET TBLPROPERTIES (hoodie.datasource.write.partitionpath.field = '<partition_field>:simple') - You cannot use SQL DDL to modify the name of a table created by a Flow.
- Avoid performing schema evolution that is not backwards-compatible (e.g. adding a column with a non-null default value) via SQL on tables with active Flows, as this may cause compatibility issues.
- Running
DROP TABLEwill only only drop the catalog entry for the table. To delete the table in your bucket, useDROP TABLE ... PURGE.