Skip to main content

onehouse_transformation

Defines a reusable, named transformation that flows can apply to data as it is ingested into Onehouse tables. A transformation is created independently and then referenced by name from one or more flows.

Canonical reference

This page documents Terraform-specific behavior (HCL syntax, types, mutability, drift, import). For full parameter semantics, valid values, and defaults, see CREATE TRANSFORMATION and DELETE TRANSFORMATION.

Names are lowercase

Transformation names are lowercased by the backend. Use a lowercase name (e.g. filter_clicks, not FilterClicks); the provider rejects mixed-case names at plan time to avoid drift.

Example Usage

Row filtering

Keep only rows where a column matches a value.

resource "onehouse_transformation" "filter_clicks" {
name = "filter_clicks"
type = "ROW_FILTERING"
row_filtering {
field = "event"
sql_operator = "EQ"
value = "click"
value_type = "string"
}
}

Data masking

Hash or truncate sensitive columns.

resource "onehouse_transformation" "mask_pii" {
name = "mask_pii"
type = "DATA_MASKING"
data_masking {
masking_type = "HASHING"
fields = ["ssn", "email"]
}
}

Add current timestamp

Add an ingestion-time timestamp column.

resource "onehouse_transformation" "ingested_at" {
name = "ingested_at"
type = "ADD_CURRENT_TIMESTAMP"
add_current_timestamp {
new_field = "ingested_at"
output_format = "yyyy-MM-dd"
}
}

Derived date

Derive a new date field from an existing field.

resource "onehouse_transformation" "event_day" {
name = "event_day"
type = "DERIVED_DATE"
derived_date {
source_field = "event_ts"
source_input_format = "EPOCH_SECONDS"
new_field = "event_day"
output_format = "yyyy-MM-dd"
}
}

Convert CDC

Convert change-data-capture records into rows. target_schema is required for MongoDB.

resource "onehouse_transformation" "cdc_mongo" {
name = "cdc_mongo"
type = "CONVERT_CDC"
convert_cdc {
cdc_format = "mongodb"
target_schema = "my_schema"
}
}

Parse JSON

Parse JSON string columns into structured fields.

resource "onehouse_transformation" "parse_payload" {
name = "parse_payload"
type = "L_PARSE_JSON"
parse_json {
fields = ["payload", "meta"]
field_schema = "my_schema"
nested_fields = ["payload.id", "meta.ts"]
}
}

Flatten struct

Flatten nested structs. recursive (default) flattens everything; selective flattens only selected_column.

resource "onehouse_transformation" "flatten_home" {
name = "flatten_home"
type = "FLATTEN_STRUCT"
flatten_struct {
operation_mode = "selective"
selected_column = "home.address"
}
}

Explode array

Explode array columns into rows.

resource "onehouse_transformation" "explode_scores" {
name = "explode_scores"
type = "EXPLODE_ARRAY"
explode_array {
operation_mode = "recursive"
}
}

User-provided (custom)

Run a custom transformer class from an uploaded onehouse_transformer_jar.

resource "onehouse_transformation" "custom" {
name = "custom_redactor"
type = "USER_PROVIDED"
user_provided {
class_name = "com.example.MyTransformer"
properties = {
mode = "fast"
}
}
}

Argument Reference

Top-level

ArgumentTypeRequiredMutabilityDescription
namestringImmutableTransformation name. Must be lowercase. SQL lookup key.
typestringImmutableOne of ROW_FILTERING, DATA_MASKING, ADD_CURRENT_TIMESTAMP, CONVERT_CDC, DERIVED_DATE, L_PARSE_JSON, FLATTEN_STRUCT, EXPLODE_ARRAY, USER_PROVIDED. → details below

Exactly one type-specific sub-block must be set, matching the type value.

type — when to pick each value

ValueUse whenBlock
ROW_FILTERINGYou want to keep only rows matching a condition. → detailsrow_filtering {}
DATA_MASKINGYou want to hash or truncate sensitive columns. → detailsdata_masking {}
ADD_CURRENT_TIMESTAMPYou want to add an ingestion-time timestamp column. → detailsadd_current_timestamp {}
CONVERT_CDCYou ingest change-data-capture records (Postgres, MySQL, SQL Server, MongoDB). → detailsconvert_cdc {}
DERIVED_DATEYou want to derive a date field from another field. → detailsderived_date {}
L_PARSE_JSONYou want to parse JSON string columns into fields. → detailsparse_json {}
FLATTEN_STRUCTYou want to flatten nested structs into top-level columns. → detailsflatten_struct {}
EXPLODE_ARRAYYou want to explode array columns into rows. → detailsexplode_array {}
USER_PROVIDEDYou want to run a custom transformer class from an uploaded JAR. → detailsuser_provided {}

row_filtering {} block

ArgumentTypeRequiredDescription
fieldstringColumn name to filter on.
sql_operatorstringComparison operator. One of EQ, NEQ, GT, GEQ, LT, LEQ.
valuestringValue to compare against.
value_typestringType of value: string or number.

data_masking {} block

ArgumentTypeRequiredDescription
masking_typestringMasking strategy: HASHING or TRUNCATION.
fieldslist(string)Field names to mask.

add_current_timestamp {} block

ArgumentTypeRequiredDescription
new_fieldstringName of the new timestamp field to add.
output_formatstringOutput date format. One of yyyy, yyyy-MM, yyyy-MM-dd.

convert_cdc {} block

ArgumentTypeRequiredDescription
cdc_formatstringCDC source format: postgresql, mysql, sqlserver, or mongodb.
target_schemastringwhen cdc_format = "mongodb"Target schema name.

derived_date {} block

ArgumentTypeRequiredDescription
source_fieldstringSource field to derive the date from.
source_input_formatstringInput date/time format of the source field (e.g. yyyy-MM-dd, EPOCH_SECONDS). See supported input formats.
new_fieldstringName of the new derived field.
output_formatstringOutput date format. One of yyyy, yyyy-MM, yyyy-MM-dd.

parse_json {} block

ArgumentTypeRequiredDescription
fieldslist(string)Top-level JSON fields to parse.
field_schemastringSchema name to use for the parsed fields.
nested_fieldslist(string)Nested JSON fields to parse (e.g. parent.child).

flatten_struct {} block

ArgumentTypeRequiredDescription
operation_modestringrecursive (default) or selective. selective requires selected_column.
selected_columnstringwhen operation_mode = "selective"Dot-separated path to the struct to flatten.

explode_array {} block

ArgumentTypeRequiredDescription
operation_modestringrecursive (default) or selective. selective requires selected_column.
selected_columnstringwhen operation_mode = "selective"Dot-separated path to the array to explode.

user_provided {} block

ArgumentTypeRequiredDescription
class_namestringFully-qualified transformer class name from an uploaded onehouse_transformer_jar.
propertiesmap(string)Optional key/value properties passed to the transformer.

Attribute Reference

AttributeTypeDescription
idstringTransformation identifier. Equal to name — the backend does not expose a separate UUID.
created_atstringCreation time in RFC3339.
created_bystringIdentity that created the transformation. May be empty.

Import

terraform import onehouse_transformation.filter_clicks filter_clicks

Import is by name. The provider repopulates the full configuration (type and the matching config block, including user_provided.properties) from the server.

Data Source

data "onehouse_transformation" "lookup" {
name = "filter_clicks"
}

output "transformation_type" {
value = data.onehouse_transformation.lookup.type
}

The data source returns identifying metadata (id, type, created_at, created_by); per-type configuration is not exposed.

Limitations

  • No Update. The API has no ALTER TRANSFORMATION — any field change forces destroy + recreate.
  • Lowercase names. Names are lowercased by the backend; the provider rejects mixed-case name values.
  • One block per resource. Set exactly one type-specific sub-block, matching type.
  • Deleting a transformation does not affect running flows. Flows already referencing it keep working with the captured config. See DELETE TRANSFORMATION.
  • Unsupported types. VECTOR_EMBEDDING and the oracle CDC format are not currently supported by this resource (the backend does not return them from DESCRIBE, which would break Read).