Actions and Editing

Flow Actions

Clone

Cloning a Flow will open the Flow creation page with the same configurations pre-filled. This will not automatically create the new Flow.

Pause

Pausing a Flow will immediately stop processing data, and any in-progress sync will stop writing to the table. Pausing the Flow will retain a checkpoint of the last processed data from the source, so you can later resume from the same position.

When a Flow is paused, all table services running on the destination table will automatically be paused. When you resume the Flow, the table services will resume.

Resume

Resuming a paused Flow will begin processing data from the last processed checkpoint in the source. The Flow will fail if it attempts to process data that is no longer available in the source (for example, due to Kafka retention period).

Delete

Deleting a Flow will remove the Flow from your project. The Flow will immediately stop processing data, and any in-progress sync will stop writing to the table.

The destination table will NOT be automatically deleted from storage or the Onehouse console. All table services running on the destination table will automatically be paused.

Clean & Restart

Performing a Clean & Restart on a Flow will archive the destination table and restart processing from the earliest available data in the source.

Table versioning

To ensure you do not lose historical data or break downstream queries, Onehouse versions every table by storing the data within a subfolder in the format:

s3://<tablePath>/<tableName>/v<versionNumber>/

The Onehouse console displays the latest version for each table as DFS Path on the table details page. When you Clean & Restart a table, the Flow creates a new version with an empty table and stops writing data to the current version. If the table is synced to catalog(s), you can continue querying the current table version while the Clean & Restart operation is in progress. As soon as the first commit is made to the new table version, Onehouse updates the catalog(s) to point to the new version so subsequent queries read the new version.

Editing Flows

After creating a Flow, you are able to edit the following configurations:

Name
Data Source (with limitations described below)
Sync Frequency
Pipeline Quarantine
Transformations
Data Quality Validations
Catalogs

Change the Data Source

Currently, you can change the data source of an existing flow to any Kafka source.

This enables use cases such as bootstrapping historical data from an object storage bucket, then continuing ingestion from an active Kafka stream.

warning

Note that if you Clean & Restart the Flow after editing the source, the re-created Flow will only capture data from its current source.

Flow Actions​

Clone​

Pause​

Resume​

Delete​

Clean & Restart​

Table versioning​

Editing Flows​

Change the Data Source​