Amazon S3
Ingest data from any Amazon S3 bucket into Onehouse tables.
Click Sources > Add New Source > S3. Then, follow the setup guide within the Onehouse console to configure your source.
Cloud Provider Support
- AWS: ✅ Supported
- GCP: Not supported
Prerequisites
- Ensure that you have granted permission to the bucket in the Terraform or CloudFormation configurations when you connected your cloud account.
- Your S3 bucket should not have any notification rule configured to it
Schema
By default, Onehouse will infer the source schema by reading a sample of the files to be ingested. Alternatively, you can provide a schema for the incoming records with a schema registry.
Supported File Formats
Onehouse supports the following file formats for ingestion from S3:
- Parquet
- Avro
- JSON
- Single-object JSON files
- JSON file with a key and list as a value
- NDJSON files
- JSON files with lists of objects (use the Explode Array transformation in your Flow)
- JSONL
- CSV
- ORC
- XML
Usage Notes
- Each file is processed exactly once.
- If the content of an object key is modified by overwriting it, Onehouse may or may not process the updated content, depending on when the object is consumed.
- To ensure data correctness and completeness, it is recommended to create a new file instead of modifying the content of an existing object.