Skip to main content

onehouse_flow

Configures a flow — an ingestion pipeline that reads from an onehouse_source and writes to a destination Onehouse table identified by (lake, database, table_name).

Canonical reference

This page documents Terraform-specific behavior (HCL syntax, types, mutability, drift, import). For full parameter semantics, valid values, and defaults, see CREATE FLOW, ALTER FLOW, and DELETE FLOW.

The <source-path> {} sub-block (one of s3 {}, gcs {}, kafka {}, onehouse_table {}, postgres {}, mysql {}) must match the type of the referenced source. The server validates the pairing at CREATE time.

Example Usage

Minimal S3 → Onehouse-table flow

resource "onehouse_flow" "events_s3" {
name = "raw-events-flow"
source = onehouse_source.events_s3.name
lake = onehouse_lake.warehouse.name
database = onehouse_database.events.name
table_name = "raw_events"
write_mode = "MUTABLE"
cluster = onehouse_cluster.ingest.name

record_key_fields = ["id"]

s3 {
folder_uri = "s3://my-raw-events-bucket/2024/"
file_format = "PARQUET"
file_extension = ".parquet"
}
}

Kafka flow with schema registry

resource "onehouse_flow" "events_kafka" {
name = "kafka-events-flow"
source = onehouse_source.kafka.name
lake = onehouse_lake.warehouse.name
database = onehouse_database.events.name
table_name = "kafka_events"
write_mode = "MUTABLE"
cluster = onehouse_cluster.ingest.name

kafka {
topic_name = "events.v1"
starting_offsets = "latest"
}

schema_registry {
type = "confluent"
confluent {
servers = "https://psrc-xxxxx.us-west-2.aws.confluent.cloud"
subject_name = "events-value"
key = "SR_KEY"
secret = "SR_SECRET"
}
}
}

Flow with partitioning, transformations, and validations

resource "onehouse_flow" "events_partitioned" {
name = "partitioned-events"
source = onehouse_source.events_s3.name
lake = onehouse_lake.warehouse.name
database = onehouse_database.events.name
table_name = "events_by_date"
write_mode = "MUTABLE"
cluster = onehouse_cluster.ingest.name

record_key_fields = ["id"]
precombine_key_field = "updated_at"
sorting_key_fields = ["ts"]

performance_profile = "BALANCED"
min_sync_frequency_mins = 5
quarantine_enabled = true
table_type = "MERGE_ON_READ"

catalogs = ["prod-glue"]
transformations = ["mask_pii"]
validations = ["schema_check"]

partition_key_fields = [
{
field = "event_date"
partition_type = "DATE_STRING"
input_format = "yyyy-MM-dd"
output_format = "yyyy-MM-dd"
},
{
field = "tenant_id"
# Non-timestamp partitions: leave optional inner fields unset.
},
]

s3 {
folder_uri = "s3://my-raw-events-bucket/"
file_format = "PARQUET"
file_extension = ".parquet"
}
}

Pause, resume, and clean-restart a flow

The state attribute lets you pause and resume a flow declaratively. The clean_and_restart_trigger is a fire-once trigger — any change to its value drops and rebuilds the destination table from scratch.

resource "onehouse_flow" "events" {
# ... (other fields)

state = "PAUSED" # change to "RUNNING" to resume

# Bump this value to fire a CLEAN_AND_RESTART. Use a timestamp, sha256, or
# any unique string. Removing the line is a no-op (won't undo the clean).
clean_and_restart_trigger = "2026-05-11T00:00:00Z"
}
warning

CLEAN_AND_RESTART is destructive — it drops the destination table and rebuilds it from the source. Use only when needed.

Argument Reference

Top-level

ArgumentTypeRequiredMutabilityDescription
namestringImmutableFlow name. → details
sourcestringImmutableName of the onehouse_source. → details
lakestringImmutableDestination lake name.
databasestringImmutableDestination database name.
table_namestringImmutableDestination table name (created by the flow).
write_modestringImmutableMUTABLE or IMMUTABLE. → details below
clusterstringMutableCompute cluster name that runs the flow. Changes issue ALTER FLOW SET CLUSTER. → details
performance_profilestringMutableBALANCED, FASTEST_READ, or FASTEST_WRITE. → details
min_sync_frequency_minsnumberMutableMinimum minutes between flow triggers. Server default 1. → details
transformationslist(string)ImmutableNames of transformations to apply.
validationslist(string)ImmutableNames of validations to apply.
quarantine_enabledbooleanMutableIf true, invalid records are quarantined instead of failing the flow. Server default true. → details
statestringMutableRUNNING or PAUSED. Changes issue ALTER FLOW SET STATE = PAUSE/RESUME. → details below
clean_and_restart_triggerstringMutableFire-once trigger. Any change fires ALTER FLOW SET STATE = CLEAN_AND_RESTART. → details below
catalogslist(string)ImmutableNames of catalogs to sync the destination table to.
record_key_fieldslist(string)MutableRecord-key columns used for dedup/update/delete. Required for S3 and Onehouse-table sources. Changes issue ALTER FLOW. → details
precombine_key_fieldstringImmutableWhen two records share a record key, the larger precombine value wins. → details
sorting_key_fieldslist(string)ImmutableSorting-key columns applied at ingest time. → details
partition_key_fieldslist(object)ImmutableDestination-table partition definition. → details below
table_typestringImmutableCOPY_ON_WRITE or MERGE_ON_READ. Server default MERGE_ON_READ. → details
delay_threshold_num_sync_intervalsnumberImmutableMultiplied by sync frequency to determine when the flow is considered delayed. 0 disables. → details
deduplication_policystringImmutablenone (default) or drop (for append-only flows). → details
table_configured_base_pathstringImmutableCustom storage location for the destination table. → details
table_partition_stylestringImmutabledefault or hive. → details

write_modeMUTABLE vs IMMUTABLE

MUTABLE tables support upserts and deletes — record keys identify rows that can be updated. IMMUTABLE tables are append-only and don't carry per-record keys. Pick MUTABLE for CDC and updatable datasets, IMMUTABLE for event/log ingestion where every record is a new fact. Changing this attribute forces destroy + recreate of the flow.

partition_key_fields

Each list entry is an object with four fields. All four must be present per the API contract — non-timestamp partitions use empty strings for the optional ones.

FieldRequiredDescription
fieldColumn name.
partition_typeOne of DATE_STRING, EPOCH_MILLIS, EPOCH_MICROS, or empty for non-timestamp partitions.
input_formatSource data format (e.g., yyyy-MM-dd). Empty for non-timestamp.
output_formatPartition output format (e.g., yyyy-MM-dd, yyyyMMddHH). Empty for non-timestamp.

state — drift detection

state participates in normal Terraform drift detection. If someone pauses a flow in the Onehouse console while your Terraform config says state = "RUNNING", the next terraform plan shows the drift and terraform apply reconciles it.

StepWhat happens
1. terraform apply with state = "RUNNING"Flow runs.
2. Operator clicks Pause in the Onehouse consoleServer state is now PAUSED.
3. terraform planProvider reads SHOW FLOWS, sees PAUSED, refreshes Terraform state. Plan shows ~ state = "PAUSED" -> "RUNNING".
4. terraform applyProvider dispatches ALTER FLOW SET STATE = RESUME. Flow runs again.

To opt out of Terraform-enforced state (so your ops team can pause/resume without Terraform reverting them), omit state from your HCL entirely. The field is Optional + Computed, so Terraform reads and surfaces the current value without enforcing it.

PatternWhat you writeDrift behavior
Declaredstate = "RUNNING"Plan shows drift; apply reconciles.
Observed-only(omit state)Field tracks server value; no enforcement.

clean_and_restart_trigger

A fire-once trigger. Any change to its value drops the destination table and rebuilds it from the source via ALTER FLOW SET STATE = CLEAN_AND_RESTART. Use any unique string — a timestamp, a SHA, an incrementing counter. Removing the line is a no-op (it does not undo the clean).

warning

CLEAN_AND_RESTART is destructive — it drops the destination table and rebuilds from the source. Use only when intentional.

Source-path sub-blocks

Exactly one of these must be set, matching the source's type. Each maps to the same source-side keys as the corresponding onehouse_source block.

Sub-blockRequired fieldsUse when source type isSQL ref
s3 {}folder_uri, file_format, file_extensionS3S3 source
gcs {}folder_uri, file_format, file_extensionGCSGCS source
kafka {}topic_name, starting_offsetsAPACHE_KAFKA / MSK_KAFKA / CONFLUENT_KAFKAKafka source
onehouse_table {}lake, database, nameONEHOUSE_TABLEOnehouse source
postgres {}table_name, schema_namePOSTGRESPostgres source
mysql {}table_nameMY_SQLMySQL source

s3 {} / gcs {} block fields

ArgumentTypeRequiredDescription
folder_uristringSource folder URI (e.g. s3://bucket/path/).
file_formatstringOne of AVRO, JSON, CSV, ORC, PARQUET.
file_extensionstringFile extension filter (e.g. .parquet, .json, .gz).
source_bootstrapstringTRUE or FALSE — whether to backfill existing files.
if_infer_fields_from_source_pathstringTRUE or FALSE — extract partition fields from the path.
fields_to_inferstringComma-separated field names to extract from the source path.
file_path_patternstringOptional path pattern filter.
csv_headerstringTRUE or FALSE — whether CSV files have a header row. Only applies when file_format = "CSV".

The optional top-level schema_registry {} block (same shape as in onehouse_source) applies across all source-path types. → Schema registry

Attribute Reference

AttributeTypeDescription
idstringFlow UUID.
created_atstringCreation time in RFC3339.
created_bystringIdentity that created the flow.
statestringRuntime state (RUNNING, PAUSED, or whatever the server reports).

Import

terraform import onehouse_flow.events events-kafka

After import, only the top-level summary fields are recovered. Most attributes (source, lake, database, source-path sub-block contents, etc.) must be re-supplied in the .tf file to avoid a forced replacement.

Data Source

data "onehouse_flow" "lookup" {
name = "events-kafka"
}

output "flow_state" {
value = data.onehouse_flow.lookup.state
}

Limitations

  • Source change is destructive. Changing source, lake, database, table_name, or write_mode forces destroy + recreate.
  • Advanced configs are immutable in place. delay_threshold_num_sync_intervals, deduplication_policy, table_configured_base_path, table_partition_style cannot be changed via ALTER FLOW yet — SET ADVANCED_CONFIGS support is a follow-up.