Onehouse Overview
Welcome to the Onehouse docs.
What is Onehouse
Onehouse is a cloud-native, fully-managed lakehouse service built on Apache Hudi. Onehouse helps you ingest, store, and manage your data. Through catalog integrations, you can explore and query your Onehouse data using any query engine of your choice.
Core concepts
- Data Warehouse: A central repository designed for storing and analyzing large amounts of data.
- Data Lake: A repository of raw data that can be used for analytics and machine learning. Unlike a data warehouse, a data lake does not require data to be processed or cleaned before it can be stored, making it a more flexible option for storing large amounts of data.
- Data Lakehouse: A modern data platform built on a data lake that combines the flexibility, cost-efficiency, and scale of data lakes with the data management and ACID transactions of data warehouses.
- Apache Hudi: An open-source data lakehouse platform that brings database and data warehouse capabilities to the data lake. Onehouse stores your data in the Hudi format to enable powerful incremental processing. Learn on the Hudi site.
Onehouse product concepts
- Lake: A managed data lake in cloud storage.
- Database: A directory within the lake to organize tables.
- Stream Capture: A process for ingesting data from an external source into a Onehouse table.
- Table: An Apache Hudi table, stored in your cloud account and managed by Onehouse.
- Catalog: A repository of metadata about your data assets. Connect catalogs to Onehouse to discover and organize your tables.