Skip to main content

Lakes, Databases, and Tables

Overview

Tables in Onehouse are organized under the following hierarchy: Lake > Database > Table.

Lakes

Lakes are a logical entity for organizing your Onehouse tables. You can create a lake in an existing cloud storage bucket and directory (lakes themselves do not create a new directory). Make sure that you've granted Onehouse access to the bucket in which you are creating the lake.

Lakes may be used for the following purposes:

  • User Permissions: Permissions for Onehouse resources (eg. Stream Captures and Tables) can be managed at lake-level.
  • Storage Isolation: Lakes can be created in separate buckets, enabling you to write tables to different buckets. This can be useful for access control on the read side when paired with a governance solution such as AWS Lake Formation.

Note on Catalogs: Catalogs such as Hive Metastore or Glue Data Catalog do not have the concept of lakes. When Onehouse performs a catalog sync, only the database and table will be synced to such catalogs.

Databases

Databases are a physical entity for organizing your Onehouse tables. When you create a new database, Onehouse creates a sub-directory for the database under the parent lake's directory.

Database names must be globally unique across all lakes (ie. you cannot have lake1 > databaseA and lake2 > databaseA). This is important for enforcing uniqueness in cloud storage and catalogs.

Tables

Tables are created as a new directory in cloud storage under the parent database's directory. You can create tables with Stream Captures.

Learn more about tables in Onehouse: