Integrate StarRocks with your Onehouse Lakehouse
This guide shows how to seamlessly integrate StarRocks with your Onehouse Managed Lakehouse. This will allow you to power serverless analytics at scale on top of the data in your Lakehouse.
Deploying StarRocks and Connecting to Onehouse in the Public Cloud
There are two main steps to achieve this:
- StarRocks Deployment: Leverage Kubernetes to automate the process. The StarRocks documentation provides instructions for this approach https://docs.starrocks.io/docs/deployment/sr_operator/..
- Hudi Catalog Integration: While the focus of this documentation https://docs.starrocks.io/docs/data_source/catalog/hudi_catalog/ is connecting StarRocks to Apache Hudi, Onehouse utilizes the Hudi external catalog feature to also connect to Onehouse. This feature enables StarRocks to access data stored in Hudi tables within your cloud storage.
In simpler terms, the first step guides you through deploying StarRocks on Kubernetes, while the second explains how to use Hudi catalogs for accessing external data stored in Onehouse through StarRocks.
Example Configs
Here are example commands to run on mysql cli to connect to AWS Glue Catalog and Apache Hudi files stored in AWS S3.
drop catalog hudi_catalog_glue;
CREATE EXTERNAL CATALOG hudi_catalog_glue PROPERTIES (
"type" = "hudi",
"hive.metastore.type" = "glue",
"aws.glue.use_instance_profile" = "false",
"aws.glue.region" = "us-west-2",
"aws.glue.access_key" = "XXXX",
"aws.glue.secret_key" = "YYYYY",
"aws.s3.use_instance_profile" = "false",
"aws.s3.region" = "us-west-2",
"aws.s3.access_key" = "XXXXXXX",
"aws.s3.secret_key" = "YYYYYY"
);
SHOW CATALOGS;
SHOW DATABASES from hudi_catalog_glue;
set CATALOG hudi_catalog_glue;
use nyctaxi_onehouse;
show tables;
select * from taxi_green_rt;
CREATE EXTERNAL CATALOG hudi_catalog_hms
PROPERTIES
(
"type" = "hudi",
"hive.metastore.type" = "hive",
"hive.metastore.uris" = "thrift://hive-metastore:9083",
"aws.s3.use_instance_profile" = "false",
"aws.s3.access_key" = "admin",
"aws.s3.secret_key" = "password",
"aws.s3.region" = "us-east-1",
"aws.s3.enable_ssl" = "false",
"aws.s3.enable_path_style_access" = "true",
"aws.s3.endpoint" = "http://minio:9000"
);