Skip to main content

Architecture

Onehouse Dataplane Architecture

Control Plane / Data Plane Split

Onehouse uses a split architecture:

  • The control plane runs in Onehouse's infrastructure. It handles orchestration, the management API, and the UI.
  • The data plane runs entirely within your cloud account and VPC. All data processing happens here.

This means Onehouse never needs direct access to your data — your data stays in your own cloud storage.

Agent Communication

A lightweight agent runs inside your data plane, within your Kubernetes cluster. It connects to the Onehouse control plane over a persistent bidirectional gRPC stream on HTTPS port 443.

Key properties:

  • The agent initiates the connection outbound — no inbound ports need to be opened on your end
  • Connection health is verified via pings every 15 seconds
  • All operations (start/stop ingestion jobs, schema updates, cluster management) are sent from the control plane through this stream

Data Residency

WhatWhere it lives
Your data (files, tables)Your cloud storage (S3 / GCS / Azure Blob) only
Metadata (schema, table names)Flows to control plane for orchestration
ComputeProvisioned in your account, auto-terminated after use
Infrastructure costsBilled to your cloud account
note

Onehouse offers an optional table preview feature that transfers a small data sample to the UI. This can be disabled in the Onehouse dashboard.

What Gets Deployed in Your Account

Kubernetes Cluster

An EKS (AWS) / GKE (GCP) / AKS (Azure) cluster runs in your private subnets and hosts:

ComponentRole
Agent serviceMaintains the gRPC connection to the Onehouse control plane
Spark OperatorManages Spark job lifecycle
Spark job podsExecutes data ingestion and processing
PrometheusIn-cluster observability
Debugging toolsDiagnostics and support access

IAM / Identity

CloudWhat gets created
AWScore_role, eks_node_role, csi_driver_role, karpenter_controller_role, support_role — IAM roles assumable by the Onehouse control plane using your request ID as the external ID
GCPonehouse-core-sa and onehouse-gke-node-sa — service accounts with scoped custom IAM roles
Azureonehouse-core-role and onehouse-node-role — user-assigned managed identities with Workload Identity Federation for passwordless auth from Onehouse's existing AWS/GCP infrastructure

Supporting Resources

CloudAdditional resources
AWSBastion host (EC2, private cluster access via SSM Session Manager); optional PrivateLink VPC endpoints and Route53 private hosted zone
GCPOptional Private Service Connect forwarding rules and private DNS zone (onehouse.ai)
AzureResource group (created or existing)

Cloud Comparison

AWSGoogle CloudAzure
Kubernetes version1.33Latest stable1.33
KubernetesEKS (Onehouse-provisioned)GKE (Onehouse-provisioned)AKS (customer-provisioned, BYOK)
ComputeEC2 m8g.xlarge (Bottlerocket ARM64)e2-standard-4Standard_D4s_v5
Node storage256 GB EBS gp3 (job workers), 75 GB (general)256 GB Hyperdisk Balanced (job workers)256 GB Managed Disk (job workers), 30 GB (system)
Node autoscalerKarpenterGKE managedAKS managed
CNIVPC CNI (/20 subnet)Alias IPs (/20 pod range, /24 service range)Azure CNI Overlay (/16 pod CIDR, /16 service CIDR)
Control planeAWS-hostedGCP-hostedGCP-hosted
Cluster endpointPrivate — Onehouse control plane IPs must be whitelistedPrivate — Onehouse control plane IPs must be whitelistedPrivate — Onehouse control plane IPs must be whitelisted
IAM modelCross-account assume role with external IDService accounts with custom rolesUser-assigned managed identities + Workload Identity Federation
Private connectivityAWS PrivateLinkPrivate Service Connect
Default availabilitySingle AZ (multi-AZ on request)Single regionSingle region