Skip to main content

Amazon Web Services

Ensure all of the following are configured in your AWS account before linking it to Onehouse.

Networking

VPC

Create a VPC with a /16 CIDR block (e.g. 10.0.0.0/16).

A /16 gives you 65,536 IP addresses across the VPC. This headroom is important: the EKS cluster creates multiple ENIs per node, and Kubernetes pods consume IP addresses directly from the subnet. Undersized VPCs are the most common cause of IP exhaustion as the cluster scales.

Subnets

TypeCountCIDR sizeReasoning
PrivateAt least 2/20 per subnet (4,096 IPs)EKS nodes and pods run here. Kubernetes assigns one IP per pod, so a /20 supports hundreds of concurrent pods per subnet with room to scale. Spread across two AZs for high availability.
PublicAt least 2/24 per subnet (256 IPs)Only the NAT Gateway and load balancers are placed here — no workload IPs required. A /24 is sufficient.
caution

EKS subnets must span at least two Availability Zones. Once the cluster is provisioned, you cannot add subnets in a new AZ — only within the AZs selected at onboarding time.

Internet Gateway

Attach an Internet Gateway to your VPC. The public subnets route through this gateway for NAT Gateway egress.

NAT Gateway

Deploy a NAT Gateway in a public subnet. For most deployments a single NAT Gateway is sufficient. For higher fault tolerance, deploy one per AZ.

All outbound traffic from EKS nodes routes through the NAT Gateway to reach the Onehouse control plane over port 443. No inbound ports need to be opened.

See AWS docs.

S3 VPC Gateway Endpoint

Create an S3 VPC Gateway Endpoint so that all EKS-to-S3 traffic stays inside the AWS network and does not route through the NAT Gateway.

This is required for two reasons:

  • Cost: NAT Gateway charges per GB. Large data volumes without an S3 endpoint will generate significant NAT costs.
  • Performance: S3 traffic over the private endpoint bypasses internet routing entirely.

See AWS docs.

S3 Gateway Endpoint Policy

Add this policy to the endpoint to allow access to container image registries (Docker, Quay, k8s registry) and your Onehouse/lakehouse buckets:

{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "AllowContainerImageRegistries",
"Effect": "Allow",
"Principal": "*",
"Action": "*",
"Resource": [
"arn:aws:s3:::docker-images-prod/*",
"arn:aws:s3:::prod-<AWS_REGION>-starport-layer-bucket/*",
"arn:aws:s3:::quayio-production-s3/*",
"arn:aws:s3:::prod-registry-k8s-io*"
]
},
{
"Sid": "AllowOnehouseAndLakehouseBuckets",
"Effect": "Allow",
"Principal": "*",
"Action": "s3:*",
"Resource": [
"arn:aws:s3:::onehouse-customer-bucket-XXXX",
"arn:aws:s3:::onehouse-customer-bucket-XXXX/*",
"arn:aws:s3:::<lake-bucket>",
"arn:aws:s3:::<lake-bucket>/*",
"arn:aws:s3:::s3-datasource-metadata-<ONEHOUSE_REQUEST_ID>",
"arn:aws:s3:::s3-datasource-metadata-<ONEHOUSE_REQUEST_ID>/*"
]
}
]
}

EKS Cluster Endpoint Access

The EKS cluster API endpoint is private. The Onehouse control plane connects to it from its NAT IPs — you must ensure these are not blocked by your security groups or NACLs:

  • 54.153.81.1/32
  • 184.169.135.156/32

You cannot kubectl directly to the cluster from outside your VPC. Cluster access for support and diagnostics is provided through the bastion host, deployed into your private subnet and reachable via AWS SSM Session Manager — which is why the bastion host is mandatory.

Egress Mode

Choose one of the following egress configurations:

Public/Private (default)

Outbound traffic from the EKS cluster routes through the NAT Gateway over the public internet to reach the Onehouse control plane. Egress is restricted to approved Onehouse IP ranges. This is the standard configuration for most deployments.

Private Only (PrivateLink)

For environments with strict compliance requirements (no public internet traversal), Onehouse supports AWS PrivateLink. In this mode, all control plane traffic stays within the AWS network — no internet routing is involved.

To enable PrivateLink, set privateLink: true in the customer stack configuration. You must also specify PrivateLink subnets. Onehouse will create:

  • Two Interface VPC Endpoints (for control plane ingress and external ingress)
  • A Route53 Private Hosted Zone for onehouse.ai DNS resolution within your VPC

VPC Peering

If your data sources (Kafka clusters, databases) live in a separate VPC, configure VPC peering between that VPC and your Onehouse VPC so the EKS cluster can reach them. See AWS docs.

Domain Allowlist

Required only if your environment has an egress firewall:

Domains to allowlist
  • .amazonaws.com
  • .docker.io
  • .onehouse.ai
  • .ecr.aws
  • .gcr.io
  • production.cloudflare.docker.com
  • d5l0dvt14r5h8.cloudfront.net
  • .k8s.io
  • .pkg.dev
  • registry.terraform.io
  • releases.hashicorp.com
  • .confluent.cloud
  • .github.com
  • .githubusercontent.com
  • .pagerduty.com
  • docker-images-prod.6aa30f8b08e16409b46e0173d6de2f56.r2.cloudflarestorage.com
  • get.helm.sh
  • auth.docker.io.cdn.cloudflare.net
  • docker-registry-production.d24a988e385e0074d717b6bdaea58f0d.r2.cloudflarestorage.com
  • .github.io
  • .strimzi.io
  • .jupyter.org
  • IAM Service Roles

    Ensure these AWS-managed service-linked roles exist in your account before deploying. These are standard AWS roles required to create and manage EKS clusters — they are not Onehouse-specific.

    RolePurpose
    AWSServiceRoleForAmazonEKSAllows EKS to manage cluster-level AWS resources
    AWSServiceRoleForAmazonEKSNodegroupAllows EKS to manage node groups and EC2 instances
    AWSServiceRoleForAutoScalingAllows EC2 Auto Scaling to manage instances within the cluster
    How to create these roles if they don't exist
    • EKS role: IAM → Create role → EKS service → EKS - Service
    • EKS Node Group role: IAM → Create role → EKS service → EKS - Nodegroup
    • EC2 Auto Scaling role: IAM → Create role → EC2 Auto Scaling service → EC2 Auto Scaling

    Storage

    Terraform State Bucket

    Create an S3 bucket named onehouse-customer-bucket-<RequestIdPrefix> in the same region as your deployment. This bucket stores Terraform state and Onehouse configuration artifacts.

    Lakehouse Bucket

    Create an S3 bucket for your data lakehouse in the same region as your deployment. Cross-region buckets will incur data transfer costs.

    Encryption

    By default, S3 SSE encryption is applied automatically and requires no additional setup.

    If your organisation requires KMS customer managed keys:

    KMS setup (complete after onboarding)
    1. Create a KMS key in AWS KMS.
    2. Add onehouse-customer-eks-node-role-XXXX and onehouse-customer-core-role-XXXX as key users. These roles are created during the customer stack deployment.
    3. Update the customer stack: set s3KmsKeys to the KMS key ARN(s) and re-apply Terraform.
    4. In your S3 bucket → Default encryption → select SSE-KMS → choose your key → enable bucket key.
    tip

    Enabling the S3 bucket key reduces KMS API request costs by up to 99%.

    Service Control Policies (SCPs)

    If your AWS organisation enforces Service Control Policies, ensure the following actions are not denied for the Onehouse IAM roles (onehouse-customer-*):

    • ec2:* (describe, manage instances, security groups, VPCs — scoped to tagged resources)
    • eks:* (create, describe, manage clusters and node groups)
    • iam:CreateRole, iam:AttachRolePolicy, iam:PassRole
    • s3:GetObject, s3:PutObject, s3:ListBucket (on Onehouse and lakehouse buckets)
    • cloudwatch:PutMetricData, logs:CreateLogGroup, logs:PutLogEvents
    • sts:AssumeRole (for cross-account access from Onehouse control plane)

    If your organisation uses a permissions boundary, set the permissions_boundary variable in the customer stack to the ARN of your boundary policy. All Onehouse IAM roles will be created with this boundary applied.

    Custom Tags

    If your organisation requires all AWS resources to carry specific tags (for cost allocation, compliance, or access control), set the customTags variable in the customer stack:

    customTags = {
    "CostCenter" = "data-platform"
    "Environment" = "production"
    "Owner" = "data-eng"
    }

    These tags are applied to all resources created by the Onehouse Terraform module, including IAM roles, EC2 instances, and EKS resources.

    Validate Your AWS Setup

    Before deploying the customer stack, run the validation scripts below to catch any networking or permissions issues early.

    Download: aws_permissions_check.sh and uris.txt — place both files in the same directory.

    chmod +x aws_permissions_check.sh
    ./aws_permissions_check.sh <VPC-ID> <AWS_REGION> <ONEHOUSE_REQUEST_ID>

    The script validates VPC configuration, subnet layout, IAM service roles, S3 endpoint access, and domain reachability. Fix any reported issues before proceeding to deployment.