Amazon S3

Ingest data from any Amazon S3 bucket into Onehouse tables.

Click Sources > Add New Source > S3. Then, follow the setup guide within the Onehouse console to configure your source.

Cloud Provider Support

AWS: ✅ Supported
GCP: Not supported

Prerequisites

Ensure that you have granted permission to the bucket in the Terraform or CloudFormation configurations when you connected your cloud account.
Your S3 bucket should not have any notification rule configured to it

Schema

By default, Onehouse will infer the source schema by reading a sample of the files to be ingested. Alternatively, you can provide a schema for the incoming records with a schema registry.

Supported File Formats

Onehouse supports the following file formats for ingestion from S3:

Parquet
Avro
JSON
- Single-object JSON files
- JSON file with a key and list as a value
- NDJSON files
- JSON files with lists of objects (use the Explode Array transformation in your Flow)
JSONL
CSV
ORC
XML

Usage Notes

Each file is processed exactly once.
If the content of an object key is modified by overwriting it, Onehouse may or may not process the updated content, depending on when the object is consumed.
- To ensure data correctness and completeness, it is recommended to create a new file instead of modifying the content of an existing object.

Cloud Provider Support​

Prerequisites​

Schema​

Supported File Formats​

Usage Notes​

Cloud Provider Support

Prerequisites

Schema

Supported File Formats

Usage Notes