Skip to main content

Onboarding Prerequisites

Prerequisites

You'll need a few things before you get started. Make sure each of these are set up correctly before you link your cloud provider in Onehouse.

Virtual Private Cloud Setup (VPC)

To set up your AWS environment, ensure the following VPC requirements are met:

1. Virtual Private Cloud (VPC)

  • CIDR Range:
    • Allocate a VPC with a CIDR block of /16 or /20 to isolate your resources.

2. Subnets

  • Private Subnets:
    • Create at least two private subnets (recommended: one per Availability Zone).
    • Each should have a subnet mask between /17 and /20.
      • Large private subnet CIDR blocks allow for scalability of the Onehouse platform.
  • Public Subnets:
    • Create at least two public subnets (recommended: one per Availability Zone).
    • Public subnets can use a smaller range than the private subnets.
    • Only the NAT Gateway will be deployed in the public subnets.
  • Subnet Configuration:
    • AWS documentation for subnet configuration can be found here.

3. Availability Zones (AZs)

  • Deploy your subnets across a minimum of two Availability Zones to ensure high availability.
  • The EKS cluster will be placed in the private subnets.

4. S3 VPC Endpoint

  • If you don’t already have one, set up an S3 VPC endpoint to allow communication between EKS and S3 within the VPC—this avoids S3 traffic being routed through the NAT Gateway.

5. Egress Connection

Onehouse can be deployed in either a public/private egress setup or a truly private setup using PrivateLink. Choose the approach that best aligns with your organization's security and compliance requirements.

  • Public/Private deployment:

    By default, Onehouse deploys in public/private mode. In this setup, traffic requiring egress access flows over the internet via a NAT Gateway or Transit Gateway (for Hub/Spoke account networking models). Egress traffic from the Onehouse EKS cluster is restricted to approved Onehouse IPs.

    Instructions for NAT Gateway setup can be found here. The NAT gateway should be deployed in a public subnet.

    Note: For basic deployments, a single NAT Gateway in one Availability Zone is sufficient. For greater redundancy and fault tolerance, deploy additional NAT Gateways in each AZ.

  • Private Only deployment:

    Onehouse also supports a Private Only deployment using AWS PrivateLink, where no traffic traverses the public internet between the Onehouse control plane and your EKS cluster. In order to enable private only mode, be sure to set the PrivateLink config to privateLink:true in your Onehouse Customer Stack

IAM Service Roles for Amazon EKS

Make sure that AWSServiceRoleForAmazonEKSNodegroup, AWSServiceRoleForAmazonEKS, and AWSServiceRoleForAutoScaling all exist in your account.

Step-by-step instructions to create IAM service roles
  • For the EKS role: Go to IAM roles, select the "Create role" option, choose "EKS" as the service, then "EKS - Service" and create the role.
  • For the EKS Node Group role: Go to IAM roles, select the "Create role" option, choose "EKS" as the service, then "EKS - Nodegroup" and create the role.
  • For the EC2 Auto Scaling role: Go to IAM roles, select the "Create role" option, choose "EC2 Auto Scaling" as the service, then "EC2 Auto Scaling" and create the role.

Gateway Endpoints for Amazon S3 - Access to Image Registries

You need to add the following buckets that are used by registries such as Docker, K8s registry and Quay.io. Onehouse leverages these registries for managing and pulling images of the various microservices deployed in your account. Onehouse needs access to their S3 CDN buckets to pull the images

S3 Gateway Endpoint Policy for Container Image Registries
{
"Version": "2012-10-07",
"Id": "...",
"Statement": [
{
"Sid": "stmtAllowDockerImages",
"Effect": "Allow",
"Principal": "*",
"Action": "*",
"Resource": [
"arn:aws:s3:::docker-images-prod/*",
"arn:aws:s3:::prod-<AWS_REGION>-starport-layer-bucket/*",
"arn:aws:s3:::quayio-production-s3/*",
"arn:aws:s3:::prod-registry-k8s-io*"
]
}
]
}

Storage

S3 Bucket Access

Onehouse uses Amazon S3 as the storage mechanism for you data lakehouse on AWS. In order to properly configure Onehouse's access to your S3 storage there are a few steps that you need to take.

1. Create your Lakehouse Bucket

Create an S3 bucket (or reuse an existing bucket) in the same region as your Onehouse deployment for Onehouse to use as your data lake. Configuring a bucket in a different region will incur costly cross-region data transfer costs.

2. Configure Bucket Encryption/Security

Default Encryption: AWS S3 SSE

By default, S3 buckets use AWS's SSE encryption to protect data at rest. The Onehouse dataplane automatically has the permissions to decrypt and process data that is encrypted using AWS S3 SSE.

Optional: KMS Encryption with Customer Managed Keys

You may want to leverage the AWS KMS service using customer managed keys for custom encryption requirements in your organization. Onehouse will require permissions to the keys used to encrypt your S3 buckets to operate the platform. Listed below are detailed steps for creating these keys and granting Onehouse the needed permissions:

Onehouse KMS SSE permissions
The below steps should only be done after successful onboarding of the Onehouse platform.

Create an AWS KMS Key:

  • Use the AWS Key Management Service (KMS) to create a new key.
  • Ensure the IAM roles onehouse-customer-eks-node-role-XXXX and onehouse-customer-core-role-XXXX are added as KMS key users. If these roles do not exist, create a standard AWS S3 bucket, complete the onboarding process, and then repeat these steps.

Configure Customer Managed Encryption KMS Key in Onehouse onboarding:

  • Terraform: Update the s3KmsKeys variable with the ARN(s) (Amazon Resource Name) of your KMS key(s) and apply the terrafrom.
  • AWS CloudFormation: In the "KMS keys configuration," specify the ARN of your KMS key and run the wizard again.
  • Use of Customer Managed Encryption does not require you to change your current Onehouse Secret Manager setting in Onehouse onboarding.

Modify the S3 Bucket to add Customer Managed Encryption:

  • Navigate to the "Default encryption" settings of your previously created bucket.
  • Select "Server-side encryption with AWS Key Management Service keys (SSE-KMS)".
  • Choose the KMS key you created earlier to encrypt the bucket, select enable for "bucket key" and save the changes. This will automatically encrypt all new files uploaded to the bucket.
  • To encrypt existing files, select all files in the bucket.
  • Go to "Actions" and click "Edit server-side encryption".
  • Select SSE-KMS, enter the ARN of your encryption key, select enable for "bucket key" and click "Save Changes". This will apply SSE-KMS encryption to all your current files.
tip

Enabling the bucket key for SSE-KMS can reduce AWS KMS request costs by up to 99 percent by decreasing the request traffic from Amazon S3 to AWS KMS.

3. Gateway Endpoints for Amazon S3 - Access to Onehouse Buckets

Onehouse will be reading/writing data into S3 buckets. Therefore, we need to make sure that there are gateway endpoints between your VPC and S3 to ensure that network traffic does not flow through the NAT gateway. AWS docs for setting this up can be found here

If you are additionally overriding your VPC endpoint policy that allows access to all S3 buckets, please make sure you add/ edit the existing policy to allow the following bucket that is used by Onehouse to store configs, logs etc. in your AWS account. Please also add all the lakehouse buckets to this policy to leverage the S3 endpoint for data transfer.

Sample Policy for S3 Gateway Endpoint Access
{
"Version": "...",
"Id": "...",
"Statement": [
{
"Sid": "Allow-access-to-onehouse-bucket-and-lakes",
"Effect": "Allow",
"Principal": "*",
"Action": "s3:*",
"Resource": [
"arn:aws:s3:::onehouse-customer-bucket-XXXX",
"arn:aws:s3:::onehouse-customer-bucket-XXXX/*",
"arn:aws:s3:::<lake-bucket>",
"arn:aws:s3:::<lake-bucket>/*",
"arn:aws:s3:::s3-datasource-metadata-<ONEHOUSE_REQUEST_ID>",
"arn:aws:s3:::s3-datasource-metadata-<ONEHOUSE_REQUEST_ID>/*"
]
}
]
}

VPC peering

The Onehouse stack will need network access to read from your data sources (e.g. Kafka Clusters/DBs). Please make sure that VPC peering is set up properly if data sources are outside your Onehouse VPC. Additional AWS docs on how to do this can be found here

List of allowed domains

If your company operates in a highly-regulated industry, and your AWS environment is protected by an egress firewall, then the following list of domains needs to be allowlisted in order for the Onehouse data plane to function:

Onehouse domains which require whitelisting
  • ".amazonaws.com"
  • ".docker.io"
  • ".cloudfront.net"
  • ".onehouse.ai"
  • ".ecr.aws"
  • ".gcr.io"
  • ".quay.io"
  • ".k8s.io"
  • ".pkg.dev"
  • ".terraform.io"
  • ".hashicorp.com"
  • ".confluent.cloud"
  • ".github.com"
  • ".githubusercontent.com"
  • ".pagerduty.com"
  • ".cloudflare.docker.com"