Skip to main content

Integrate DuckDB with your Onehouse Lakehouse

This guide shows how to seamlessly integrate DuckDB with your Onehouse Managed Lakehouse. This will allow you to power serverless analytics at scale on top of the data in your Lakehouse.

How does the DuckDB integration work

As of July 2024, DuckDB only supports reading Apache Iceberg and Delta Lake open table formats. Onehouse Lakehouse has the ability to generate Apache Iceberg and Delta Lake tables using the OneTable Catalog feature. To work, you will need:

At this point, when you ingest data through the Stream Capture, it will update the Data Catalog and also generate the Apache Iceberg and/or Delta Lake tables.

Example Commands

Here are example commands to run on the terminal to connect DuckDB to Onehouse Lakehouse data stored in AWS S3.

export AWS_S3_REGION=us-west-2
export AWS_ACCESS_KEY_ID="XXXXX"
export AWS_SECRET_ACCESS_KEY="YYYYY"
export AWS_SESSION_TOKEN="ZZZZZ"
duckdb
v1.0.0 1f98600c2c
Enter ".help" for usage hints.
Connected to a transient in-memory database.
Use ".open FILENAME" to reopen on a persistent database.
D install iceberg;load iceberg;
D CREATE SECRET <secret_name> (
TYPE S3,
PROVIDER CREDENTIAL_CHAIN
);
┌─────────┐
│ Success │
│ boolean │
├─────────┤
│ true │
└─────────┘
D SELECT count(*)
FROM iceberg_scan('s3://path/to/onehouse/table/v*');
┌──────────────┐
│ count_star() │
│ int64 │
├──────────────┤
│ 2044422 │
└──────────────┘
D

Here is an example to connect DuckDB on MotherDuck to Onehouse Lakehouse data stored in AWS S3.

CREATE SECRET IN MOTHERDUCK (
TYPE S3,
KEY_ID 's3_access_key',
SECRET 's3_secret_key',
REGION 'us-east-1'
);
SELECT count(*) FROM iceberg_scan('s3://path/to/onehouse/table/v*');