Actions and Editing
Flow Actions
Clone
Cloning a Flow will open the Flow creation page with the same configurations pre-filled. This will not automatically create the new Flow.
Pause
Pausing a Flow will immediately stop processing data, and any in-progress sync will stop writing to the table. Pausing the Flow will retain a checkpoint of the last processed data from the source, so you can later resume from the same position.
When a Flow is paused, all table services running on the destination table will automatically be paused. When you resume the Flow, the table services will resume.
Resume
Resuming a paused Flow will begin processing data from the last processed checkpoint in the source. The Flow will fail if it attempts to process data that is no longer available in the source (for example, due to Kafka retention period).
Delete
Deleting a Flow will remove the Flow from your project. The Flow will immediately stop processing data, and any in-progress sync will stop writing to the table.
The destination table will NOT be automatically deleted from storage or the Onehouse console. All table services running on the destination table will automatically be paused.
Clean & Restart
Performing a Clean & Restart on a Flow will archive the destination table and restart processing from the earliest available data in the source.
Table versioning
To ensure you do not lose historical data or break downstream queries, Onehouse versions every table by storing the data within a subfolder in the format:
s3://<tablePath>/<tableName>/v<versionNumber>/
The Onehouse console displays the latest version for each table as DFS Path on the table details page. When you Clean & Restart a table, the Flow creates a new version with an empty table and stops writing data to the current version. If the table is synced to catalog(s), you can continue querying the current table version while the Clean & Restart operation is in progress. As soon as the first commit is made to the new table version, Onehouse updates the catalog(s) to point to the new version so subsequent queries read the new version.
Editing Flows
After creating a Flow, you are able to edit the following configurations:
- Name
- Data Source (with limitations described below)
- Sync Frequency
- Pipeline Quarantine
- Transformations
- Data Quality Validations
- Catalogs
Change the Data Source
Currently, you can change the data source of an existing flow to any Kafka source.
This enables use cases such as bootstrapping historical data from an object storage bucket, then continuing ingestion from an active Kafka stream.
Note that if you Clean & Restart the Flow after editing the source, the re-created Flow will only capture data from its current source.