Skip to main content

Intro to Onehouse

Onehouse is a fully-managed data lakehouse platform deployed in your own cloud environment. It offers lightning-fast data pipelines, ingestion, and lakehouse tables at half the cost of other popular data platforms and with no vendor lock-in.

sql_endpoint_url

Common Use Cases

Ingest data for analytics and AI

Flows offer fast, cost-effective, and fully-managed ingestion from a variety of data sources. Bring in data from sources such as Postgres, MySQL, Apache Kafka, object storage, and more to your data lakehouse for analytics and AI. Clean and transform data in-flight, apply data quality validations, and monitor your pipelines from the Onehouse console.

Run fast, cost-efficient Apache Spark

Jobs enable you to easily deploy and manage Apache Spark workloads on Onehouse for large-scale data processing and machine learning. Jobs run on the Onehouse Quanton engine, providing industry-leading cost-performance, with complete Apache Spark compatibility.

Transform data with SQL

SQL Clusters enable you to run Spark SQL queries on your data lakehouse tables. Point tools like dbt, Airflow, or DbVisualizer at a Onehouse SQL Cluster to perform DML/DDL operations and run analytical queries on the Onehouse Quanton engine.

Optimize your data lakehouse

Table Services automatically optimize your tables for up to 10x gains in read/write performance with services like cleaning, compaction, and clustering. These services run automatically, and can be configured for both Onehouse-managed tables and external tables.

Perform data analysis and machine learning

Onehouse stores your data in formats optimized for data analytics and AI/ML workloads. The platform makes it easy to deploy or connect specialized engines:

  • Open Engines Clusters enable you to deploy popular open source engines for data analysis (Trino), streaming (Apache Flink), and machine learning (Ray) on autoscaling compute.
  • The Metadata Sync Service can automatically sync your tables to multiple catalogs. This enables you to read Onehouse tables from any external query engine (such as Snowflake, Databricks, or Amazon Athena) without making copies of the data.

Core Technology

Open Table Formats

Open table formats such as Apache Hudi, Apache Iceberg, and Delta Lake enable you to store data without vendor lock-in. These formats are powerful, supporting warehouse-like capabilities, such as ACID transactions, directly on object storage.

Onehouse offers a highly-optimized, managed platform that can store your data in all three of the major open table formats. With OneTable sync, you can easily read a single copy of data as any open table format, giving you the flexibility to query with different engines optimized for each format.

Quanton Engine

The Onehouse Quanton engine delivers 2-3x ETL cost-performance versus leading engines including Databricks w/ Photon, AWS EMR, and Snowflake. Quanton has complete Apache Spark compatibility, making it easy to migrate existing Spark jobs without rewriting code. The Quanton engine powers SQL Clusters, Jobs, and other compute workloads in Onehouse. Learn more about the Onehouse Quanton Engine here.

Onehouse Compute Runtime

The Onehouse Compute Runtime (OCR) provides a fully-managed compute infrastructure that automatically scales based on your workload requirements. It operates on compute instances within your cloud account, keeping your data secure. OCR includes functionality such as:

  • Adaptive Workload Optimizer: Runtime features that intelligently react to workloads.
  • Serverless Compute Manager: Compute infrastructure optimized for the most challenging lakehouse workloads.
  • High-Performance Lakehouse I/O: A radical rethinking of some foundational lakehouse operations.

Onehouse Clusters automatically run on top of OCR. Learn more about OCR here.

FAQ

What does Onehouse charge for?

Onehouse uses a Bring-Your-Own-Cloud (BYOC) model and charges based on compute usage measured in Onehouse Compute Units (OCU). You pay for compute instances running in your cloud account, plus support costs. Onehouse does not charge additional volume-based or storage fees - you only pay your cloud provider for storage costs. The billing is based on the formula: ($ rate per hourly OCU) * (# of hourly OCU consumed). You can view the full documentation on usage and billing here.

Where does Onehouse store my data and is it secure?

Onehouse delivers its management services on a data plane inside of your cloud account. Unlike many vendors, this ensures no data ever leaves the trust boundary of your private networks and sensitive production databases are not exposed externally. You maintain ownership of all your data in your personal S3, GCS, or other cloud storage buckets. Onehouse’s commitment to openness is to ensure your data is future-proof. Onehouse is SOC 2 Type I and II, and PCI DSS compliant.