Connect dbt core

dbt is a framework that makes it easy to transform data, with official support for Onehouse documented at: https://docs.getdbt.com/docs/cloud/connect-data-platform/connect-onehouse.

You can use dbt Core within your VPC to run pipelines on the Onehouse SQL Cluster. To set up dbt Cloud, follow this solution guide.

Connect to your VPC

By default, SQL Clusters only accept traffic from within your VPC. To submit queries with SQL clients outside the VPC, such as your local machine, follow these steps to connect through a VPN or bastion host.

Set up dbt Core

On the machine connected to your VPC, install dbt Core and the dbt-spark[PyHive] adapter
Clone a dbt project or use the dbt init command line to create a new dbt project

Create or update the profiles.yml file to specify connection details for the target where tables will be written.

Port: 10000 (or your local forwarded port)
Host:
1. If connecting through a VPN, use the SQL endpoint URL from the Onehouse console
2. If connecting through a bastion host, use localhost

Template:

dbt_project:
    outputs:
        dev:
            host: <SQL-CLUSTER-ENDPOINT>
            method: thrift
            port: 10000
            schema: <DATABASE>
            threads: 1
            type: spark
    target: dev

Create your model under the models/ directory. Set configs at the top of the model file to specify how the table will be written. Here are some example configs:

{{
    config(
        materialized='incremental',
        file_format='hudi',
        incremental_strategy='merge',
        unique_key='store_id',
        partition_by='sale_date',
        location_root='s3://bucket/lake/database/',
        options={
        'type': 'mor',
        'primaryKey': 'sale_id',
        'precombineKey': 'event_timestamp'
        }
    )
}}

In your query, you can reference an existing Onehouse table created by a Flow with the following syntax: from database_name.table_name.
Including location_root is required for table materialization mode, and optional for incremental materialization mode.

Run dbt run in command line to execute the model

Connect to your VPC​

Set up dbt Core​

Connect to your VPC

Set up dbt Core