Open Engines
Open Engines Clusters allow you to deploy open source compute engines on Onehouse infrastructure. This allows you to easily spin up engines for different use cases, such as analytics queries, stream processing, and machine learning.
The following engines are currently supported:
Engine | Best for |
---|---|
Trino | Fast, read-only SQL queries for analytics |
Apache Flink | Stream processing |
Ray | AI, machine learning, and data science |
Open Engines integrate with the full Onehouse platform (though there are intially some limitations). You can read from existing Onehouse tables with Open Engines and deploy Onehouse-managed Table Services on tables created with Open Engines.
Pricing
Onehouse is offering Open Engines for free for a limited time. You will not be billed OCU for Open Engines usage, but will still pay your cloud provider for any cloud resource consumption.
Customer Support
While other Onehouse product offerings include options for enterprise-grade customer support, Open Engines customer support is limited to infrastructure-level issues at this time. For debugging engine-specific issues, you should leverage open source channels.
If you require a solution with full customer support at the engine-level, we are happy to connect you with one of our specialized compute engine partners.
Create a Cluster
To use Open Engines, you must first create an Open Engines Cluster in Onehouse. When creating the Cluster, you will select one of the supported engines.
Access the Cluster
Open Engines queries and workloads must be submitted directly to the Cluster (ie. not through the Onehouse control plane). When you create an Open Engines Cluster, you will get an endpoint that can only be accessed from within the VPC. We suggest connecting through a bastion host or VPN.
Limitations
- Trino and Ray are read-only for Onehouse tables due to their open source implementations.
- Tables created by Open Engines can only be viewed and managed by Onehouse in the Apache Hudi format, and must be created as External Tables under an Observed Lake. We soon plan to add support for tables created in Managed Lakes.
- Open Engines do not yet integrate with lock providers in Onehouse. You must add your lock provider configurations manually for the Open Engines writer when writing concurrently to a table other Onehouse writers such as Stream Captures or Table Services. We plan to integrate with Onehouse lock providers soon.
- Trino and Flink can only connect to one external catalog currently. Trino supports a Glue or DataProc catalog, and Flink supports a Hive Metastore or DataProc catalog.
- Access control (eg. CREATE ROLE in Trino) is not yet supported.