Skip to main content

Amazon S3

Ingest data from any Amazon S3 bucket into Onehouse tables.

Click Sources > Add New Source > S3. Then, follow the setup guide within the Onehouse console to configure your source.

Cloud Provider Support

  • AWS: ✅ Supported
  • GCP: Not supported

Prerequisites

  1. Ensure that you have granted permission to the bucket in the Terraform or CloudFormation configurations when you connected your cloud account.
  2. Your S3 bucket should not have any notification rule configured to it

Schema

By default, Onehouse will infer the source schema by reading a sample of the files to be ingested. Alternatively, you can provide a schema for the incoming records with a schema registry.

Supported File Formats

Onehouse supports the following file formats for ingestion from S3:

  • Parquet
  • Avro
  • JSON
    • Single-object JSON files
    • JSON file with a key and list as a value
    • NDJSON files
    • JSON files with lists of objects (use the Explode Array transformation in your Flow)
  • JSONL
  • CSV
  • ORC
  • XML

Usage Notes

  • Each file is processed exactly once.
  • If the content of an object key is modified by overwriting it, Onehouse may or may not process the updated content, depending on when the object is consumed.
    • To ensure data correctness and completeness, it is recommended to create a new file instead of modifying the content of an existing object.