Lakes, Databases, and Tables
Overview
Tables in Onehouse are organized under the following hierarchy: Lake > Database > Table.
Lakes
Lakes are a logical entity for organizing your Onehouse tables. You can create a lake in an existing cloud storage bucket and directory (lakes themselves do not create a new directory). Make sure that you've granted Onehouse access to the bucket in which you are creating the lake.
Lakes may be used for the following purposes:
- User Permissions: Permissions for Onehouse resources (eg. Stream Captures and Tables) can be managed at lake-level.
- Storage Isolation: Lakes can be created in separate buckets, enabling you to write tables to different buckets. This can be useful for access control on the read side when paired with a governance solution such as AWS Lake Formation.
Note on Catalogs: Catalogs such as Hive Metastore or Glue Data Catalog do not have the concept of lakes. When Onehouse performs a catalog sync, only the database and table will be synced to such catalogs.
Databases
Databases are a physical entity for organizing your Onehouse tables. When you create a new database, Onehouse creates a sub-directory for the database under the parent lake's directory.
Database names must be globally unique across all lakes (ie. you cannot have lake1 > databaseA
and lake2 > databaseA
). This is important for enforcing uniqueness in cloud storage and catalogs.
Tables
Tables are created as a new directory in cloud storage under the parent database's directory. You can create tables with Stream Captures.
Learn more about tables in Onehouse: