onehouse_transformation
Defines a reusable, named transformation that flows can apply to data as it is ingested into Onehouse tables. A transformation is created independently and then referenced by name from one or more flows.
This page documents Terraform-specific behavior (HCL syntax, types, mutability, drift, import). For full parameter semantics, valid values, and defaults, see CREATE TRANSFORMATION and DELETE TRANSFORMATION.
Transformation names are lowercased by the backend. Use a lowercase name (e.g. filter_clicks, not FilterClicks); the provider rejects mixed-case names at plan time to avoid drift.
Example Usage
Row filtering
Keep only rows where a column matches a value.
resource "onehouse_transformation" "filter_clicks" {
name = "filter_clicks"
type = "ROW_FILTERING"
row_filtering {
field = "event"
sql_operator = "EQ"
value = "click"
value_type = "string"
}
}
Data masking
Hash or truncate sensitive columns.
resource "onehouse_transformation" "mask_pii" {
name = "mask_pii"
type = "DATA_MASKING"
data_masking {
masking_type = "HASHING"
fields = ["ssn", "email"]
}
}
Add current timestamp
Add an ingestion-time timestamp column.
resource "onehouse_transformation" "ingested_at" {
name = "ingested_at"
type = "ADD_CURRENT_TIMESTAMP"
add_current_timestamp {
new_field = "ingested_at"
output_format = "yyyy-MM-dd"
}
}
Derived date
Derive a new date field from an existing field.
resource "onehouse_transformation" "event_day" {
name = "event_day"
type = "DERIVED_DATE"
derived_date {
source_field = "event_ts"
source_input_format = "EPOCH_SECONDS"
new_field = "event_day"
output_format = "yyyy-MM-dd"
}
}
Convert CDC
Convert change-data-capture records into rows. target_schema is required for MongoDB.
resource "onehouse_transformation" "cdc_mongo" {
name = "cdc_mongo"
type = "CONVERT_CDC"
convert_cdc {
cdc_format = "mongodb"
target_schema = "my_schema"
}
}
Parse JSON
Parse JSON string columns into structured fields.
resource "onehouse_transformation" "parse_payload" {
name = "parse_payload"
type = "L_PARSE_JSON"
parse_json {
fields = ["payload", "meta"]
field_schema = "my_schema"
nested_fields = ["payload.id", "meta.ts"]
}
}
Flatten struct
Flatten nested structs. recursive (default) flattens everything; selective flattens only selected_column.
resource "onehouse_transformation" "flatten_home" {
name = "flatten_home"
type = "FLATTEN_STRUCT"
flatten_struct {
operation_mode = "selective"
selected_column = "home.address"
}
}
Explode array
Explode array columns into rows.
resource "onehouse_transformation" "explode_scores" {
name = "explode_scores"
type = "EXPLODE_ARRAY"
explode_array {
operation_mode = "recursive"
}
}
User-provided (custom)
Run a custom transformer class from an uploaded onehouse_transformer_jar.
resource "onehouse_transformation" "custom" {
name = "custom_redactor"
type = "USER_PROVIDED"
user_provided {
class_name = "com.example.MyTransformer"
properties = {
mode = "fast"
}
}
}
Argument Reference
Top-level
| Argument | Type | Required | Mutability | Description |
|---|---|---|---|---|
name | string | ✅ | Immutable | Transformation name. Must be lowercase. SQL lookup key. |
type | string | ✅ | Immutable | One of ROW_FILTERING, DATA_MASKING, ADD_CURRENT_TIMESTAMP, CONVERT_CDC, DERIVED_DATE, L_PARSE_JSON, FLATTEN_STRUCT, EXPLODE_ARRAY, USER_PROVIDED. → details below |
Exactly one type-specific sub-block must be set, matching the type value.
type — when to pick each value
| Value | Use when | Block |
|---|---|---|
ROW_FILTERING | You want to keep only rows matching a condition. → details | row_filtering {} |
DATA_MASKING | You want to hash or truncate sensitive columns. → details | data_masking {} |
ADD_CURRENT_TIMESTAMP | You want to add an ingestion-time timestamp column. → details | add_current_timestamp {} |
CONVERT_CDC | You ingest change-data-capture records (Postgres, MySQL, SQL Server, MongoDB). → details | convert_cdc {} |
DERIVED_DATE | You want to derive a date field from another field. → details | derived_date {} |
L_PARSE_JSON | You want to parse JSON string columns into fields. → details | parse_json {} |
FLATTEN_STRUCT | You want to flatten nested structs into top-level columns. → details | flatten_struct {} |
EXPLODE_ARRAY | You want to explode array columns into rows. → details | explode_array {} |
USER_PROVIDED | You want to run a custom transformer class from an uploaded JAR. → details | user_provided {} |
row_filtering {} block
| Argument | Type | Required | Description |
|---|---|---|---|
field | string | ✅ | Column name to filter on. |
sql_operator | string | ✅ | Comparison operator. One of EQ, NEQ, GT, GEQ, LT, LEQ. |
value | string | ✅ | Value to compare against. |
value_type | string | ✅ | Type of value: string or number. |
data_masking {} block
| Argument | Type | Required | Description |
|---|---|---|---|
masking_type | string | ✅ | Masking strategy: HASHING or TRUNCATION. |
fields | list(string) | ✅ | Field names to mask. |
add_current_timestamp {} block
| Argument | Type | Required | Description |
|---|---|---|---|
new_field | string | ✅ | Name of the new timestamp field to add. |
output_format | string | ✅ | Output date format. One of yyyy, yyyy-MM, yyyy-MM-dd. |
convert_cdc {} block
| Argument | Type | Required | Description |
|---|---|---|---|
cdc_format | string | ✅ | CDC source format: postgresql, mysql, sqlserver, or mongodb. |
target_schema | string | when cdc_format = "mongodb" | Target schema name. |
derived_date {} block
| Argument | Type | Required | Description |
|---|---|---|---|
source_field | string | ✅ | Source field to derive the date from. |
source_input_format | string | ✅ | Input date/time format of the source field (e.g. yyyy-MM-dd, EPOCH_SECONDS). See supported input formats. |
new_field | string | ✅ | Name of the new derived field. |
output_format | string | ✅ | Output date format. One of yyyy, yyyy-MM, yyyy-MM-dd. |
parse_json {} block
| Argument | Type | Required | Description |
|---|---|---|---|
fields | list(string) | ✅ | Top-level JSON fields to parse. |
field_schema | string | ✅ | Schema name to use for the parsed fields. |
nested_fields | list(string) | Nested JSON fields to parse (e.g. parent.child). |
flatten_struct {} block
| Argument | Type | Required | Description |
|---|---|---|---|
operation_mode | string | recursive (default) or selective. selective requires selected_column. | |
selected_column | string | when operation_mode = "selective" | Dot-separated path to the struct to flatten. |
explode_array {} block
| Argument | Type | Required | Description |
|---|---|---|---|
operation_mode | string | recursive (default) or selective. selective requires selected_column. | |
selected_column | string | when operation_mode = "selective" | Dot-separated path to the array to explode. |
user_provided {} block
| Argument | Type | Required | Description |
|---|---|---|---|
class_name | string | ✅ | Fully-qualified transformer class name from an uploaded onehouse_transformer_jar. |
properties | map(string) | Optional key/value properties passed to the transformer. |
Attribute Reference
| Attribute | Type | Description |
|---|---|---|
id | string | Transformation identifier. Equal to name — the backend does not expose a separate UUID. |
created_at | string | Creation time in RFC3339. |
created_by | string | Identity that created the transformation. May be empty. |
Import
terraform import onehouse_transformation.filter_clicks filter_clicks
Import is by name. The provider repopulates the full configuration (type and the matching config block, including user_provided.properties) from the server.
Data Source
data "onehouse_transformation" "lookup" {
name = "filter_clicks"
}
output "transformation_type" {
value = data.onehouse_transformation.lookup.type
}
The data source returns identifying metadata (id, type, created_at, created_by); per-type configuration is not exposed.
Limitations
- No Update. The API has no
ALTER TRANSFORMATION— any field change forces destroy + recreate. - Lowercase names. Names are lowercased by the backend; the provider rejects mixed-case
namevalues. - One block per resource. Set exactly one type-specific sub-block, matching
type. - Deleting a transformation does not affect running flows. Flows already referencing it keep working with the captured config. See
DELETE TRANSFORMATION. - Unsupported types.
VECTOR_EMBEDDINGand theoracleCDC format are not currently supported by this resource (the backend does not return them from DESCRIBE, which would break Read).