Create a Cluster

Open the Clusters page in the Onehouse console to create a new Cluster. Below we will cover the configurations to set up the Cluster.

tip

You can create multiple Clusters of the same type to isolate different workloads.

Basic configurations

Name: The name by which to identify the Cluster.
Cluster Type: Select from the following Cluster types. This cannot be changed after creation.
- Managed: Run Flows to ingest data and Table Services to optimize your tables.
- SQL: Run SQL workloads on the Quanton engine. Submit queries through a JDBC Endpoint or the Onehouse SQL Editor.
- Spark: Create and run Jobs to execute Apache Spark code in Python, Java, or Scala on the Quanton engine.
- Open Engines: Deploy open source compute engines on Onehouse infrastructure with Open Engines.
- Notebook (beta feature): Deploy a Jupyter notebook on Onehouse infrastructure to run interactive PySpark workloads on the Onehouse Quanton engine.

OCU configurations

Specify OCU limits to constrain the min/max Onehouse Compute Units (OCU) the Cluster will use per hour. This will determine the how many instances the Cluster can use, based on the hourly OCU cost of your selected instance type(s).

Max OCU / Hour: Maximum OCU the Cluster will use per hour. Set this to manage your costs.
Min OCU / Hour: Minimum OCU the Cluster will use per hour. Set this higher if you need to keep the Cluster warm.

tip

Setting Max OCU for your Clusters can help you confidently keep costs under a budget. If your data volumes or complexity of the workload change and your Cluster usage hits its Max OCU, the Cluster will not continue scaling up. This may lead to delays in data processing, so it is important to consider how your workloads grow or fluctuate.

Instance type configurations

Specify the instance types for the Cluster. Learn more about the available instance types, custom instance types, and OCU costs here.

Worker Type: Specify the instance type for the Cluster's workers (aka executors).
Spot Instances: Optionally enable spot instances for the Cluster's workers.
- Not enabled by default.
- Enabling this may help reduce your cloud provider compute costs for workloads that do not require high-availability.
- Enabling this will not affect OCU pricing.
- This configuration only enables spot instances for workers. Drivers will always use on-demand instances.
Driver Type: Specify the instance type for the Cluster's driver(s).
- When set to Auto (the default option), drivers will use the same instance type as workers.

Managed Cluster table limit based on driver type

Managed Clusters can write to up to N tables, where N = 75 × (number of driver node cores). This limit applies regardless of how many Flows or Table Services are writing to those tables.

Catalog

The catalog configuration is required for SQL, Spark, and Open Engines (Trino and Flink) Clusters. The specified catalog will be used by the Cluster to read and write data.

SQL and Spark Clusters

SQL and Spark Clusters can connect to the following catalogs:

Onehouse Managed (recommended): Use the built-in catalog that integrates seamlessly with all Onehouse services. This option is recommended, unless you explicitly need to connect to an external catalog.
External Iceberg REST Catalog (IRC): Connect to an external Iceberg REST Catalog (IRC) as your primary catalog. Currently, you can integrate with AWS Glue Iceberg REST Catalog and Snowflake Open Catalog. Note the following limitations with IRC:
- Catalog events (such as table creation) will not be registered to Onehouse.
- Tables created with IRC cannot run table services.
- Additional limitations are called out in the documentation for each specific catalog.

Open Engines Clusters

Open Engines Trino and Apache Flink Clusters must connect to an external catalog as their primary catalog. We plan to add support for the Onehouse Managed catalog in the future.

Attached storage

Currently available in AWS projects only.

Clusters automatically provision attached storage volumes, such as Amazon EBS, to support additional disk spilling for memory-intensive workoads.

The following attached storage volumes will be provisioned based on your Cluster instance types.

Instance Cores	Storage Size	Volume Configuration	IOPS	Throughput
4	250 GB	2 X 125GB	6000	250 MiB/s
8	500 GB	4 X 125GB	9000	500 MiB/s
16	1000 GB	4 X 250GB	12000	750 MiB/s
32	2000 GB	4 X 500GB	12000	750 MiB/s
48	3000 GB	3 X 1TB	16000	1000 MiB/s
64	4000 GB	4 X 1TB	16000	1000 MiB/s
96	6000 GB	3 X 2TB	16000	1000 MiB/s
192	12000 GB	4 X 3TB	16000	1000 MiB/s

You can also specify custom storage sizes via API commands: CREATE CLUSTER and ALTER CLUSTER.

NVME instance types

Attached storage will not be provisioned for NVME instance types, as the local NVME storage is faster. NVME instance types are identified by a -d suffix in the instance type name (e.g. m5d.large).

Cluster usage notifications

You may set an OCU Limit Notification Threshold to receive a notification when the Cluster has scaled to X% of the Maximum OCU. Usage is calculated as the average over the past hour.

The default notification threshold is 80%, but you can set this to any value, or set 0% to disable notifications.

Notifications are received via the Onehouse UI, email, and (if configured) Slack.

Basic configurations​

OCU configurations​

Instance type configurations​

Catalog​

SQL and Spark Clusters​

Open Engines Clusters​

Attached storage​

Cluster usage notifications​