Skip to main content

Review Architecture

Overview

Onehouse has a split architecture, where the Control-plane runs in the Onehouse’s AWS account, and the data-processing systems run within your GCP account/ VPC.

It is important to understand that our product architecture is designed to help you maintain control of all of your data by leaving the data in the cloud buckets owned by your GCP accounts. When you use Onehouse for data management, the compute resources associated with those activities are also provisioned in your account with autoscaling and auto termination features. As a result, you will incur costs for the cloud infrastructure spun up in your GCP accounts and projects using the Onehouse product.

GKE cluster

Onehouse operates a GKE cluster inside a separate GCP project in the Customer’s GCP Account.

  • That cluster will read data from customer’s sources and write into their GCS buckets.
  • The cluster nodes will run in customer’s provided private subnets.
  • The nodes will only need outbound internet access.

Service accounts

Onehouse will have 2 service accounts with necessary permissions to manage the cluster.

  1. Core Service Account
    • Used by Onehouse for our management layer service to programmatically access thesetup for the purpose of provisioning resources.
    • Used by Onehouse oncall to manually access in case of any issues, upgrades, patches, monitoring, scaling requirements.
    • The permissions of this service account are scoped specifically to manage resources of the shared GCP project.
  2. Node Service Account
    • Used by the nodes of the Kubernetes cluster.
    • Customers can selectively authorize this service account to limit the access to specific DBs/Kafka Clusters or GCS buckets to pull data from.

Network permissions

Onehouse setup does not require any modifications to network permissions.

  • We don’t open any ports. There are no incoming connections.
  • The pods have the following outbound connections:
    • The agent will make an outbound connection to the hostname gwc.onehouse.ai.
    • All the pod logs will be forwarded to the hostname telemetry.onehouse.ai and will be stored in Onehouse’s control plane.

All communications with the Onehouse control plane use HTTPS (gRPC) on port 443.

Compute and Storage Resources

Onehouse provisions e2-standard-4 Compute Engine VM instances within your GCP account to process data. These instances are deployed and managed via the GKE cluster in your GCP account.

For each Compute Engine VM instance, Onehouse provisions 256 GB of storage via Hyperdisk Balanced Block Storage volumes. This low-cost storage enables you to process more data per instance for faster processing and reliable performance when data spikes occur.