Skip to main content

Integrate Trino with your Onehouse Lakehouse

This guide shows how to seamlessly integrate Trino with your Onehouse Managed Lakehouse. This will allow you to power serverless analytics at scale on top of the data in your Lakehouse.

Deploying Trino and Connecting to Onehouse

For AWS

If you are on AWS, AWS provide several guides that help you plug in Trino into your enviornment.

EMR: https://docs.aws.amazon.com/emr/latest/ReleaseGuide/emr-presto-glue.html EKS: https://awslabs.github.io/data-on-eks/docs/blueprints/distributed-databases/trino and https://trino.io/docs/current/installation/kubernetes.html

Trino on EMR is easier to setup but will add costs as you look at larger workloads while EKS deployment will be cheaper but might have some infra-related initial pain.

For local machine or on a virtual machine

If you are installing Trino in your local machine or in a virtual machine, you will need to provide Trino information on how to connect to HMS and S3. Below is a example configuration of the catalog/hudi.properties with connectivity to S3 and AWS Glue.

connector.name=hudi
# HMS is the default, just supply URI
# hive.metastore.uri=thrift://hive-metastore:9083
hive.s3.aws-access-key=XXXXX
hive.s3.aws-secret-key=YYYYY
hive.s3.path-style-access=true
hive.s3.region=us-west-2
# AWS is default, if you are using Min.IO supply Min.IO endpoint and remove AWS region
# hive.s3.endpoint=http://minio:9000
hive.metastore=glue
hive.metastore.glue.region=us-west-2
hive.metastore.glue.aws-access-key=XXXXX
hive.metastore.glue.aws-secret-key=YYYYY
# AWS Glue Catalog ID is the AWS account ID
hive.metastore.glue.catalogid=582558643208