Resources | Onehouse Docs

📄️ onehouse_cluster

Provisions and manages a Onehouse compute cluster. Compute clusters are the execution engines for jobs, SQL workloads, and ingestion flows.

Registers a Onehouse data lake. A lake is the top-level grouping for databases and tables. It points at a root path in cloud storage and is associated with a managed compute cluster that runs table services.

📄️ onehouse_database

Creates a database — a lake-scoped namespace for tables.

📄️ onehouse_catalog

Configures an external catalog. Onehouse syncs table metadata to the catalog so external query engines (Spark, Trino, Athena, etc.) can discover and read Onehouse tables.

📄️ onehouse_source

Defines a data source for ingestion. A source represents the upstream system that an onehouse_flow reads from — an S3 bucket, a Kafka topic, a Postgres database, and so on.

📄️ onehouse_flow

Configures a flow — an ingestion pipeline that reads from an onehousesource and writes to a destination Onehouse table identified by (lake, database, tablename).

📄️ onehouse_table_service

Manages a table service — an automated maintenance operation that runs on a specific table. Table services keep tables healthy by clustering data for faster reads, compacting small files, cleaning up old versions, syncing metadata to external catalogs, creating automatic savepoints, and restoring to previous savepoints.

📄️ onehouse_transformation

Defines a reusable, named transformation that flows can apply to data as it is ingested into Onehouse tables. A transformation is created independently and then referenced by name from one or more flows.

📄️ onehouse_transformer_jar

Manages a custom transformer JAR registered from an object-storage location (S3 or GCS). Onehouse copies the uploaded JAR to its own managed storage location and references it by name for use in custom transformations.