Skip to main content

Create a Cluster

Open the Clusters page in the Onehouse console to create a new Cluster. Below we will cover the configurations to set up the Cluster.

tip

You can create multiple Clusters of the same type to isolate different workloads.

Basic configurations

  • Name: The name by which to identify the Cluster.
  • Cluster Type: Select from the following Cluster types. This cannot be changed after creation.
    • Managed: Run Flows to ingest data and Table Services to optimize your tables.
    • SQL: Run SQL workloads on the Quanton engine. Submit queries through a JDBC Endpoint or the Onehouse SQL Editor.
    • Spark: Create and run Jobs to execute Apache Spark code in Python, Java, or Scala on the Quanton engine.
    • Open Engines: Deploy open source compute engines on Onehouse infrastructure with Open Engines.
    • Notebook (beta feature): Deploy a Jupyter notebook on Onehouse infrastructure to run interactive PySpark workloads on the Onehouse Quanton engine.

OCU configurations

Specify OCU limits to constrain the min/max Onehouse Compute Units (OCU) the Cluster will use per hour. This will determine the how many instances the Cluster can use, based on the hourly OCU cost of your selected instance type(s).

  • Max OCU / Hour: Maximum OCU the Cluster will use per hour. Set this to manage your costs.
  • Min OCU / Hour: Minimum OCU the Cluster will use per hour. Set this higher if you need to keep the Cluster warm.
tip

Setting Max OCU for your Clusters can help you confidently keep costs under a budget. If your data volumes or complexity of the workload change and your Cluster usage hits its Max OCU, the Cluster will not continue scaling up. This may lead to delays in data processing, so it is important to consider how your workloads grow or fluctuate.

Instance type configurations

Specify the instance types for the Cluster. Learn more about the available instance types, custom instance types, and OCU costs here.

  • Worker Type: Specify the instance type for the Cluster's workers (aka executors).
  • Spot Instances: Optionally enable spot instances for the Cluster's workers.
    • Not enabled by default.
    • Enabling this may help reduce your cloud provider compute costs for workloads that do not require high-availability.
    • Enabling this will not affect OCU pricing.
    • This configuration only enables spot instances for workers. Drivers will always use on-demand instances.
  • Driver Type: Specify the instance type for the Cluster's driver(s).
    • When set to Auto (the default option), drivers will use the same instance type as workers.
Managed Cluster table limit based on driver type

Managed Clusters can write to up to N tables, where N = 75 × (number of driver node cores). This limit applies regardless of how many Flows or Table Services are writing to those tables.

Catalog

The catalog configuration is required for SQL, Spark, and Open Engines (Trino and Flink) Clusters. The specified catalog will be used by the Cluster to read and write data.

SQL and Spark Clusters

SQL and Spark Clusters can connect to the following catalogs:

  • Onehouse Managed (recommended): Use the built-in catalog that integrates seamlessly with all Onehouse services. This option is recommended, unless you explicitly need to connect to an external catalog.
  • External Iceberg REST Catalog (IRC): Connect to an external Iceberg REST Catalog (IRC) as your primary catalog. Currently, you can integrate with AWS Glue Iceberg REST Catalog and Snowflake Open Catalog. Note the following limitations with IRC:
    • Catalog events (such as table creation) will not be registered to Onehouse.
    • Tables created with IRC cannot run table services.
    • Additional limitations are called out in the documentation for each specific catalog.

Open Engines Clusters

Open Engines Trino and Apache Flink Clusters must connect to an external catalog as their primary catalog. We plan to add support for the Onehouse Managed catalog in the future.

Attached storage

Currently available in AWS projects only.

Clusters automatically provision attached storage volumes, such as Amazon EBS, to support additional disk spilling for memory-intensive workoads.

The following attached storage volumes will be provisioned based on your Cluster instance types.

Instance CoresStorage SizeVolume ConfigurationIOPSThroughput
4250 GB2 X 125GB6000250 MiB/s
8500 GB4 X 125GB9000500 MiB/s
161000 GB4 X 250GB12000750 MiB/s
322000 GB4 X 500GB12000750 MiB/s
483000 GB3 X 1TB160001000 MiB/s
644000 GB4 X 1TB160001000 MiB/s
966000 GB3 X 2TB160001000 MiB/s
19212000 GB4 X 3TB160001000 MiB/s

You can also specify custom storage sizes via API commands: CREATE CLUSTER and ALTER CLUSTER.

Cluster usage notifications

You may set an OCU Limit Notification Threshold to receive a notification when the Cluster has scaled to X% of the Maximum OCU. Usage is calculated as the average over the past hour.

The default notification threshold is 80%, but you can set this to any value, or set 0% to disable notifications.

Notifications are received via the Onehouse UI, email, and (if configured) Slack.