Google Cloud Storage
Ingest data from any Google Cloud Storage (GCS) bucket into Onehouse tables.
Click Sources > Add New Source > Google Cloud Storage. Then, follow the setup guide within the Onehouse console to configure your source.
Cloud Provider Support
- AWS: Not supported
- GCP: ✅ Supported
Prerequisites
- Ensure that you have granted permission to the bucket in the Terraform or CloudFormation configurations when you connected your cloud account.
- Your GCS bucket should not have any notification rule configured to it
Schema
By default, Onehouse will infer the source schema by reading a sample of the files to be ingested. Alternatively, you can provide a schema for the incoming records with a schema registry.
Supported File Formats
Onehouse supports the following file formats for ingestion from GCS:
- Parquet
- Avro
- JSON
- Single-object JSON files
- JSON file with a key and list as a value
- NDJSON files
- JSON files with lists of objects (use the Explode Array transformation in your Flow)
- JSONL
- CSV
- ORC
- XML
Usage Notes
- Each file is processed exactly once.
- If the content of an object key is modified by overwriting it, Onehouse may or may not process the updated content, depending on when the object is consumed.
- To ensure data correctness and completeness, it is recommended to create a new file instead of modifying the content of an existing object.