Skip to main content

Flows

Flows ingest data from a Source into a new Onehouse table. Flows are fully-managed with built-in orchestration, scaling, and alerting.

Common Use Cases

Key Capabilities

  • Ingest: Incrementally ingest data from a wide variety of sources.
  • Transform: Perform low-code transformations on data in-flight.
  • Validate: Define data validations and quarantine bad data without breaking the ingestion pipeline.
  • Monitor: Easily view and search logs, monitor comprehensive dashboards and metrics, and receive alerts when Flows are broken or delayed.

Statuses

Flows can have the following statuses:

  • Running: The Flow is active. This state does not specify whether an active sync is in progress.
  • Delayed: The Flow's in-progress or most recent sync took longer than the specified delay threshold set with the flow.delayThreshold.numSyncIntervals advanced configuration.
  • Paused: The Flow is paused and will not perform syncs.
  • Failed: The Flow encountered errors after three attempts to sync data. While in the failed state, the Flow continues to retry on an interval that increases with each failure. If a sync succeeds during a retry, the Flow moves to the Running status.

Logs

You can view logs for Flows directly in the Onehouse console. Open the Flow, then click the "Logs" tab.

info

Flow logs are retained for 7 days.

Usage Guidelines

  • Flows run on Managed Clusters.
  • Flows will fail if the data volume of shuffle operations exceeds the available storage (disk space) for the project.
    • If you encounter situations that require additional storage (e.g. exploding an array with many elements), you can increase the Cluster's OCU Limit.