Skip to main content

Lakes, Databases, and Tables

Overview

Tables in Onehouse are organized under the following hierarchy: Lake > Database > Table.

Lakes

Lakes are a logical entity for organizing your Onehouse tables. You can create a lake in an existing cloud storage bucket and directory (lakes themselves do not create a new directory). Make sure that you've granted Onehouse access to the bucket in which you are creating the lake.

Lakes may be used for the following purposes:

  • User Permissions: Permissions for Onehouse resources (eg. Flows and Tables) can be managed at lake-level.
  • Storage Isolation: Lakes can be created in separate buckets, enabling you to write tables to different buckets. This can be useful for access control on the read side when paired with a governance solution such as AWS Lake Formation.

When creating a new Data Lake, by default the type will be created as a "Managed Lake". This means it is a lake that you intend to write new tables to with Onehouse. If you want to simply add other existing tables that are managed external to Onehouse, you can choose the type of "Observed Lake".

Note on Catalogs: Catalogs such as Hive Metastore or Glue Data Catalog do not have the concept of lakes. When Onehouse performs a catalog sync, only the database and table will be synced to such catalogs.

Databases

Databases are a physical entity for organizing your Onehouse tables. When you create a new database, Onehouse creates a sub-directory for the database under the parent lake's directory.

Database names must be globally unique across all lakes (ie. you cannot have lake1 > databaseA and lake2 > databaseA). This is important for enforcing uniqueness in cloud storage and catalogs.

Tables

Tables are created as a new directory in cloud storage under the parent database's directory. You can create tables with Flows.

Learn more about tables in Onehouse: