Skip to main content

Integrate StarRocks with your Onehouse Lakehouse

This guide shows how to seamlessly integrate StarRocks with your Onehouse Managed Lakehouse. This will allow you to power serverless analytics at scale on top of the data in your Lakehouse.

Deploying StarRocks and Connecting to Onehouse in the Public Cloud

There are two main steps to achieve this:

In simpler terms, the first step guides you through deploying StarRocks on Kubernetes, while the second explains how to use Hudi catalogs for accessing external data stored in Onehouse through StarRocks.

Example Configs

Here are example commands to run on mysql cli to connect to AWS Glue Catalog and Apache Hudi files stored in AWS S3.

drop catalog hudi_catalog_glue;
CREATE EXTERNAL CATALOG hudi_catalog_glue PROPERTIES (
"type" = "hudi",
"hive.metastore.type" = "glue",
"aws.glue.use_instance_profile" = "false",
"aws.glue.region" = "us-west-2",
"aws.glue.access_key" = "XXXX",
"aws.glue.secret_key" = "YYYYY",
"aws.s3.use_instance_profile" = "false",
"aws.s3.region" = "us-west-2",
"aws.s3.access_key" = "XXXXXXX",
"aws.s3.secret_key" = "YYYYYY"
);
SHOW CATALOGS;
SHOW DATABASES from hudi_catalog_glue;
set CATALOG hudi_catalog_glue;
use nyctaxi_onehouse;
show tables;
select * from taxi_green_rt;
CREATE EXTERNAL CATALOG hudi_catalog_hms
PROPERTIES
(
"type" = "hudi",
"hive.metastore.type" = "hive",
"hive.metastore.uris" = "thrift://hive-metastore:9083",
"aws.s3.use_instance_profile" = "false",
"aws.s3.access_key" = "admin",
"aws.s3.secret_key" = "password",
"aws.s3.region" = "us-east-1",
"aws.s3.enable_ssl" = "false",
"aws.s3.enable_path_style_access" = "true",
"aws.s3.endpoint" = "http://minio:9000"
);