Skip to main content

CREATE CLUSTER

Description

Create a new Cluster.

Note that the SQL statement does not end with ;

Syntax

CREATE CLUSTER `<cluster_name>`
TYPE = { 'Managed' | 'SQL' | 'Spark' | 'Open_Engines' }
MAX_OCU = <int>
MIN_OCU = <int>
WITH 'key1' = 'value1', 'key2' = 'value2' ....

Sample response

Examples

Create a Managed Cluster:

CREATE CLUSTER `managed_cluster_retail`
TYPE = 'Managed'
MAX_OCU = 10
MIN_OCU = 1

Create a Spark Cluster with larger, spot workers:

CREATE CLUSTER `jobs_prod`
TYPE = 'Spark'
MAX_OCU = 10
WITH 'worker.type' = 'oh-general-8', 'worker.spot' = 'True'

Create a Trino Open Engines Cluster:

CREATE CLUSTER `trino_cluster`
TYPE = 'Open_Engines'
MAX_OCU = 10
MIN_OCU = 1
WITH 'open_engines.engine' = 'Trino', 'open_engines.catalog' = 'glue_catalog_name'

Create a Ray Open Engines Cluster:

CREATE CLUSTER `ray_cluster`
TYPE = 'Open_Engines'
MAX_OCU = 10
MIN_OCU = 1
WITH 'open_engines.engine' = 'Ray', 'open_engines.ray.max_cpu_units' = '7', 'open_engines.ray.min_cpu_units' = '1', 'open_engines.ray.max_gpu_units' = '3'

Required parameters

  • <cluster_name>: Specify a unique name for the Cluster.
  • TYPE: Specify the type of Cluster. This determines the type of workloads the Cluster can run.
  • MAX_OCU: Specify the Maximum OCU the Cluster can scale to.
  • MIN_OCU: Specify the Minimum OCU the Cluster will always run.

Special parameters

Include special parameters and advanced configs after WITH as type String.

Instance type parameters

  • worker.type: Specify the worker instance type as a String. Must be a standard Onehouse instance type (eg. 'oh-general-4') or a custom instance type that's been enabled for the project (eg. 'm8g.xlarge'). View more on instance types.
  • worker.spot: Specify 'TRUE' or 'FALSE' to enable/disable spot instances for workers.
  • driver.type: Specify the driver instance type as a String. Default is 'Auto', which uses the same instance type as workers.

Open Engines parameters

  • open_engines.engine: Specify the compute engine as 'Trino', 'Flink', or 'Ray'.
  • open_engines.catalog: For Flink or Trino Clusters, specify the name of the catalog to use. This field is required for Trino but optional for Flink.
  • open_engines.ray.max_cpu_units: Specify the maximum CPU units the Cluster can scale to. The max CPU Units and max GPU units must sum to the Max OCU you set above.
  • open_engines.ray.min_cpu_units: Specify the minimum CPU units the Cluster will always run. This must be the same value you set for the Min OCU above.
  • open_engines.ray.max_gpu_units: Specify the maximum GPU units the Cluster can scale to. The max CPU Units and max GPU units must sum to the Max OCU you set above.

Attached storage parameters

For AWS projects only.

  • attached_storage.driver_storage_size: Optionally, specify a custom amount of attached storage (in GB) to provision for drivers in the Cluster. Select from any storage size in this table. If you do not include this parameter, the default attached storage values will be provisioned based on the instance type.
  • attached_storage.worker_storage_size: Optionally, specify a custom amount of attached storage (in GB) to provision for workers in the Cluster. Select from any storage size in this table. If you do not include this parameter, the default attached storage values will be provisioned based on the instance type.