Skip to main content

Clusters

Clusters allow you to isolate your workloads and independently scale compute resources. Each workload in Onehouse must be assigned to a Cluster.

Cluster types

Each Cluster has a specific type that determines the types of workloads you can run on the Cluster. A Cluster's type cannot be changed after creation.

Cluster types:

  • Managed: Run Flows to ingest data and Table Services to optimize your tables.
  • SQL: Run SQL workloads on the Quanton engine. Submit queries through a JDBC Endpoint or the Onehouse SQL Editor.
  • Spark: Create and run Jobs to execute Apache Spark code in Python, Java, or Scala on the Quanton engine.
  • Open Engines: Deploy open source compute engines on Onehouse infrastructure with Open Engines.
  • Notebook (beta feature): Deploy a Jupyter notebook on Onehouse infrastructure to run interactive PySpark workloads on the Onehouse Quanton engine.

You can always move workloads between different Clusters of a compatible type.

View-only Clusters

Additionally, each Onehouse project comes with the following view-only Clusters, which cannot be modified, deleted, or assigned workloads:

  • System (view-only): Runs essential operational tasks such as monitoring, job scheduling, autoscaling, and other tasks to ensure the continuous operation of your pipelines. This is enabled for all Onehouse projects, and cannot be disabled. The consumption of this Cluster may grow if the scale of your project requires additional overhead to ensure smooth operations.
  • Connector (view-only): Runs all resources required for source connectors, such as event notification consumers for file sources and change data capture (CDC) event listeners for database sources. If a project has no file or database CDC sources, it will not run a Connector Cluster.

The System and Connector Clusters do contribute toward a project's OCU consumption.

Cluster permissions

Learn more about Cluster roles and permissions here.

Best practices

  • When possible, run workloads on the same Cluster to enable resource sharing (ie. multiplexing and bin-packing), which allows for more efficient OCU usage.
  • You can get optimal performance by running Table Services and Flows in the same Cluster for a given table. By default this will be the case for every new table unless you explicitly change the Services Cluster.
  • A Cluster will restart when you edit the OCU or instance type configurations. In-progress queries and jobs on the Cluster will be canceled.

When to isolate resources into separate Clusters

  • You have a small subset of high-priority Flows ingesting data for business-critical dashboards. You want to ensure these high-priority Flows do not lag when data volumes spike across the project, so you assign them to a separate Cluster with dedicated resources.
  • You have latency-sensitive near-realtime data streaming into a table, and want to run a non-latency-sensitive backfill workload on the same table. You can use Clusters to keep the realtime stream on dedicated compute so the backfill workload does not cause lag for the latency-sensitive data.
  • You have multiple teams sharing a Onehouse project and want to give each team dedicated OCU resources so teams cannot hog resources from each other.
  • You want to provide dedicated OCU resources to "speed up" a certain workload, such as a large bootstrap from Kafka.

Learn more