Skip to main content

Apache Iceberg Quickstart

This quickstart shows examples of creating tables with Apache Iceberg in Python and Java using an external Apache Iceberg REST catalog.

Apache Iceberg Limitations

The following limitations apply when writing to tables in the Apache Iceberg table format:

  • Tables created directly in the Apache Iceberg format will not appear in the Onehouse console.
  • You cannot run Onehouse-managed table services on tables created directly in the Apache Iceberg format.
  • You cannot write to a table in multiple formats -- all writers must use the same table format. However, you can use OneTable catalog sync to enable for read-compatibility across formats.
  • See additional limitations for your Iceberg REST Catalog:
tip

You can avoid these limitations by writing data in Apache Hudi and syncing the table to Apache Iceberg with OneTable catalog sync. See the Apache Hudi Jobs quickstart.

Prerequisites

  • Add an Iceberg REST Catalog (IRC), such as AWS Glue IRC or Snowflake Open Catalog, into your Onehouse project.
    • Using one of these catalogs will automatically install Apache Iceberg 1.10.0 on your Cluster.

Python Quickstart

  1. In the Onehouse console, open Clusters and click Create Cluster.
  2. Configure your Cluster:
    1. Select Spark as the Cluster type.
    2. For the catalog, select the Iceberg REST catalog you added.
    3. Create the Cluster with any other configurations you'd like.
  3. Download the Iceberg quickstart files: iceberg_pyspark_quickstart.zip.
  4. Follow the remaining steps from the Python Job instructions to upload your executable and packages, then create and run the Job.

Java Quickstart

  1. In the Onehouse console, open Clusters and click Create Cluster.
  2. Configure your Cluster:
    1. Select Spark as the Cluster type.
    2. For the catalog, select the Iceberg REST catalog you added.
    3. Create the Cluster with any other configurations you'd like.
  3. Download the Iceberg quickstart files: iceberg_java_quickstart.zip.
  4. Follow the remaining steps from the JAR Job instructions to upload your JAR file, then create and run the Job.
    1. In the sample file, your main class will be org.example.SparkIceberg.