Skip to main content

Google Cloud Storage

Ingest data from any Google Cloud Storage (GCS) bucket into Onehouse tables.

Click Sources > Add New Source > Google Cloud Storage. Then, follow the setup guide within the Onehouse console to configure your source.

Cloud Provider Support

  • AWS: Not supported
  • GCP: ✅ Supported

Prerequisites

  1. Ensure that you have granted permission to the bucket in the Terraform or CloudFormation configurations when you connected your cloud account.
  2. Your GCS bucket should not have any notification rule configured to it

Schema

By default, Onehouse will infer the source schema by reading a sample of the files to be ingested. Alternatively, you can provide a schema for the incoming records with a schema registry.

Supported File Formats

Onehouse supports the following file formats for ingestion from GCS:

  • Parquet
  • Avro
  • JSON
    • Single-object JSON files
    • JSON file with a key and list as a value
    • NDJSON files
    • JSON files with lists of objects (use the Explode Array transformation in your Flow)
  • JSONL
  • CSV
  • ORC
  • XML

Usage Notes

  • Each file is processed exactly once.
  • If the content of an object key is modified by overwriting it, Onehouse may or may not process the updated content, depending on when the object is consumed.
    • To ensure data correctness and completeness, it is recommended to create a new file instead of modifying the content of an existing object.