Lakes, Databases, and Tables
Overview
Tables in Onehouse are organized under the following hierarchy: Lake > Database > Table.
Lakes
Lakes are a logical entity for organizing your Onehouse tables. You can create a lake in an existing cloud storage bucket and directory (lakes themselves do not create a new directory). Make sure that you've granted Onehouse access to the bucket in which you are creating the lake.
Lakes may be used for the following purposes:
- User Permissions: Permissions for Onehouse resources (eg. Flows and Tables) can be managed at lake-level.
- Storage Isolation: Lakes can be created in separate buckets, enabling you to write tables to different buckets. This can be useful for access control on the read side when paired with a governance solution such as AWS Lake Formation.
When creating a new Data Lake, by default the type will be created as a "Managed Lake". This means it is a lake that you intend to write new tables to with Onehouse. If you want to simply add other existing tables that are managed external to Onehouse, you can choose the type of "Observed Lake".
Note on Catalogs: Catalogs such as Hive Metastore or Glue Data Catalog do not have the concept of lakes. When Onehouse performs a catalog sync, only the database and table will be synced to such catalogs.
Databases
Databases are a physical entity for organizing your Onehouse tables. When you create a new database, Onehouse creates a sub-directory for the database under the parent lake's directory.
Database names must be globally unique across all lakes (ie. you cannot have lake1 > databaseA and lake2 > databaseA). This is important for enforcing uniqueness in cloud storage and catalogs.
Tables
Tables are created as a new directory in cloud storage under the parent database's directory. You can create tables with Flows.
Deleting a table in the Onehouse console will remove it from the console. The underlying data will be deleted from object storage only if the table was created by a Flow.
Other tables (eg. those created with SQL or Jobs) will retain the underlying data in object storage. You can can delete those manually by using a command like DROP TABLE ... PURGE with a SQL Cluster.
Learn more about tables in Onehouse: