Skip to main content

onehouse_lake

Registers a Onehouse data lake. A lake is the top-level grouping for databases and tables. It points at a root path in cloud storage and is associated with a managed compute cluster that runs table services.

Canonical reference

This page documents Terraform-specific behavior (HCL syntax, types, mutability, drift, import). For full parameter semantics, valid values, and defaults, see CREATE LAKE, ALTER LAKE, and DELETE LAKE.

caution

Deleting a lake is irreversible. By default, terraform destroy fails if the lake contains databases or tables. Set force_destroy = true to cascade-delete all dependents.

Example Usage

Observed lake on an existing S3 bucket

resource "onehouse_lake" "raw" {
name = "raw_events"
lake_type = "OBSERVED"
bucket_path = "s3://my-data-lake/raw/"
default_services_cluster = "Default Managed Cluster - 2"
}

Managed lake on GCS

resource "onehouse_lake" "warehouse" {
name = "warehouse"
lake_type = "MANAGED"
bucket_path = "gs://onehouse-warehouse/lakes/warehouse/"
default_services_cluster = onehouse_cluster.services.name
}

Reference to a separately-managed cluster

resource "onehouse_cluster" "services" {
name = "table-services"
type = "Managed"
min_ocu = 2
max_ocu = 8
worker_type = "oh-general-4"
}

resource "onehouse_lake" "warehouse" {
name = "warehouse"
lake_type = "MANAGED"
bucket_path = "s3://onehouse-warehouse/lakes/warehouse/"
default_services_cluster = onehouse_cluster.services.name
}

Argument Reference

ArgumentTypeRequiredMutabilityDescription
namestringImmutableLake name. Must be unique in the project.
lake_typestringImmutableOne of MANAGED or OBSERVED. → details
bucket_pathstringImmutableCloud-storage root path. Must end with a trailing / (server-validated). E.g., s3://bucket/dir/, gs://bucket/dir/.
default_services_clusterstringMutableName of the compute cluster that runs table services. → details
force_destroybooleanImmutableWhen true, terraform destroy cascade-deletes all databases and tables in the lake. Default false.

lake_typeMANAGED vs OBSERVED

ValueBehavior
MANAGEDOnehouse fully manages the lake — writes Hudi tables, runs compaction and cleaning, and owns the root directory.
OBSERVEDOnehouse reads existing tables under bucket_path but does not write. Use this for an existing data lake you want to expose through Onehouse without migrating data.

lake_type is immutable. To switch a lake between modes, you must destroy and recreate it. Set force_destroy = true if the lake contains dependents.

default_services_cluster

The managed cluster that runs table services (compaction, cleaning, clustering) for tables in this lake. This is the only mutable field on a lake — changing it issues ALTER LAKE <name> SET DEFAULT_SERVICES_CLUSTER = <new>. The new cluster must exist before the change is applied.

Attribute Reference

AttributeTypeDescription
idstringEqual to name (lakes have no separate UUID in the public API).
created_atstringCreation time in RFC3339.
created_bystringIdentity that created the lake. Empty when created by a service principal.
num_databasesnumberNumber of databases in the lake.
databaseslist(string)Database names in the lake (in DESCRIBE LAKE order).

Import

terraform import onehouse_lake.warehouse warehouse

Data Source

data "onehouse_lake" "lookup" {
name = "warehouse"
}

output "lake_default_cluster" {
value = data.onehouse_lake.lookup.default_services_cluster
}

Limitations

  • Immutable fields force replacement. Changing name, lake_type, or bucket_path forces destroy + recreate. If force_destroy is not set, deletion fails when dependents exist.