Architecture

Control Plane / Data Plane Split
Onehouse uses a split architecture:
- The control plane runs in Onehouse's infrastructure. It handles orchestration, the management API, and the UI.
- The data plane runs entirely within your cloud account and VPC. All data processing happens here.
This means Onehouse never needs direct access to your data — your data stays in your own cloud storage.
Agent Communication
A lightweight agent runs inside your data plane, within your Kubernetes cluster. It connects to the Onehouse control plane over a persistent bidirectional gRPC stream on HTTPS port 443.
Key properties:
- The agent initiates the connection outbound — no inbound ports need to be opened on your end
- Connection health is verified via pings every 15 seconds
- All operations (start/stop ingestion jobs, schema updates, cluster management) are sent from the control plane through this stream
Data Residency
| What | Where it lives |
|---|---|
| Your data (files, tables) | Your cloud storage (S3 / GCS / Azure Blob) only |
| Metadata (schema, table names) | Flows to control plane for orchestration |
| Compute | Provisioned in your account, auto-terminated after use |
| Infrastructure costs | Billed to your cloud account |
note
Onehouse offers an optional table preview feature that transfers a small data sample to the UI. This can be disabled in the Onehouse dashboard.
What Gets Deployed in Your Account
Kubernetes Cluster
An EKS (AWS) / GKE (GCP) / AKS (Azure) cluster runs in your private subnets and hosts:
| Component | Role |
|---|---|
| Agent service | Maintains the gRPC connection to the Onehouse control plane |
| Spark Operator | Manages Spark job lifecycle |
| Spark job pods | Executes data ingestion and processing |
| Prometheus | In-cluster observability |
| Debugging tools | Diagnostics and support access |
IAM / Identity
| Cloud | What gets created |
|---|---|
| AWS | core_role, eks_node_role, csi_driver_role, karpenter_controller_role, support_role — IAM roles assumable by the Onehouse control plane using your request ID as the external ID |
| GCP | onehouse-core-sa and onehouse-gke-node-sa — service accounts with scoped custom IAM roles |
| Azure | onehouse-core-role and onehouse-node-role — user-assigned managed identities with Workload Identity Federation for passwordless auth from Onehouse's existing AWS/GCP infrastructure |
Supporting Resources
| Cloud | Additional resources |
|---|---|
| AWS | Bastion host (EC2, private cluster access via SSM Session Manager); optional PrivateLink VPC endpoints and Route53 private hosted zone |
| GCP | Optional Private Service Connect forwarding rules and private DNS zone (onehouse.ai) |
| Azure | Resource group (created or existing) |
Cloud Comparison
| AWS | Google Cloud | Azure | |
|---|---|---|---|
| Kubernetes version | 1.33 | Latest stable | 1.33 |
| Kubernetes | EKS (Onehouse-provisioned) | GKE (Onehouse-provisioned) | AKS (customer-provisioned, BYOK) |
| Compute | EC2 m8g.xlarge (Bottlerocket ARM64) | e2-standard-4 | Standard_D4s_v5 |
| Node storage | 256 GB EBS gp3 (job workers), 75 GB (general) | 256 GB Hyperdisk Balanced (job workers) | 256 GB Managed Disk (job workers), 30 GB (system) |
| Node autoscaler | Karpenter | GKE managed | AKS managed |
| CNI | VPC CNI (/20 subnet) | Alias IPs (/20 pod range, /24 service range) | Azure CNI Overlay (/16 pod CIDR, /16 service CIDR) |
| Control plane | AWS-hosted | GCP-hosted | GCP-hosted |
| Cluster endpoint | Private — Onehouse control plane IPs must be whitelisted | Private — Onehouse control plane IPs must be whitelisted | Private — Onehouse control plane IPs must be whitelisted |
| IAM model | Cross-account assume role with external ID | Service accounts with custom roles | User-assigned managed identities + Workload Identity Federation |
| Private connectivity | AWS PrivateLink | Private Service Connect | — |
| Default availability | Single AZ (multi-AZ on request) | Single region | Single region |