Skip to main content

CREATE TRANSFORMATION

Description

Creates a new Transformation to be used in Flows.

Note that the SQL statement does not end with ;

Syntax

CREATE TRANSFORMATION `<name>`
TYPE = {'CONVERT_CDC'| 'ROW_FILTERING' | 'DERIVED_DATE'
|'ADD_CURRENT_TIMESTAMP' | 'USER_PROVIDED' | 'DATA_MASKING' | 'L_PARSE_JSON'
| 'FLATTEN_STRUCT' | 'EXPLODE_ARRAY' }
WITH 'key1' = 'value1', 'key2' = 'value2' ....
Note about USER_PROVIDED transformation type

The USER_PROVIDED transformation type is used to specify a custom transformation class.

The transformation name is case-agnostic.

Example

ROW FILTERING TRANSFORMATION

CREATE TRANSFORMATION `<name>`
TYPE = 'ROW_FILTERING'
WITH 'row.filtering.transformation.field' = 'event',
'row.filtering.transformation.sql.operator' = 'EQ',
'row.filtering.transformation.value' = 'click',
'row.filtering.transformation.value.type' = 'string'

CONVERT CDC TRANSFORMATION (MONGODB)

CREATE TRANSFORMATION `<name>`
TYPE = 'CONVERT_CDC'
WITH 'convert_cdc.cdc_format' = 'mongodb', 'convert_cdc.target_schema' = '<schema_name>'

CONVERT CDC TRANSFORMATION (POSTGRESQL)

CREATE TRANSFORMATION `<name>`
TYPE = 'CONVERT_CDC'
WITH 'convert_cdc.cdc_format' = 'postgresql'

DATA MASKING TRANSFORMATION

CREATE TRANSFORMATION `<name>`
TYPE = 'DATA_MASKING'
WITH 'data_masking.type' = {'HASHING'|'TRUNCATION'}, 'data_masking.fields' = 'key1, key2, key3'

PARSE JSON TRANSFORMATION

CREATE TRANSFORMATION `<name>`
TYPE = 'L_PARSE_JSON'
WITH 'l_parse_json.fields' = 'parent_key1, parent_key2, parent_key3',
'l_parse_json.field_schema' = '<schema_name>',
'l_parse_json.nested_fields' = 'parent_key1.nested_key1, parent_key2.nested_key2, parent_key2.nested_key3'

Sample response

FLATTEN STRUCT TRANSFORMATION

Recursive mode (default) — flatten all nested Structs in the entire schema:

CREATE TRANSFORMATION `<name>`
TYPE = 'FLATTEN_STRUCT'

Recursive mode with a specific Struct path — flatten all nested Structs under the specified path:

CREATE TRANSFORMATION `<name>`
TYPE = 'FLATTEN_STRUCT'
WITH 'flatten_struct.operation_mode' = 'recursive',
'flatten_struct.selected_column' = 'home'

Selective mode — flatten only the direct children of the specified Struct one level up into its parent scope:

CREATE TRANSFORMATION `<name>`
TYPE = 'FLATTEN_STRUCT'
WITH 'flatten_struct.operation_mode' = 'selective',
'flatten_struct.selected_column' = 'home.address'

EXPLODE ARRAY TRANSFORMATION

Recursive mode (default) — explode every Array reachable from the top-level fields. When multiple independent Arrays exist, the result is the Cartesian product of their elements:

CREATE TRANSFORMATION `<name>`
TYPE = 'EXPLODE_ARRAY'

Recursive mode with a specific Array path — explode the specified Array. When the Array's elements are Structs, Arrays inside those Structs are also exploded recursively:

CREATE TRANSFORMATION `<name>`
TYPE = 'EXPLODE_ARRAY'
WITH 'explode_array.operation_mode' = 'recursive',
'explode_array.selected_column' = 'scores'

Selective mode — explode only the specified Array, one level:

CREATE TRANSFORMATION `<name>`
TYPE = 'EXPLODE_ARRAY'
WITH 'explode_array.operation_mode' = 'selective',
'explode_array.selected_column' = 'scores'

GENERATE VECTOR EMBEDDINGS

CREATE TRANSFORMATION `<name>`
TYPE = 'VECTOR_EMBEDDING'
WITH 'vector.embedding.transformation.model.name' = 'test-model',
'vector.embedding.transformation.selected.column' = 'column1',
'vector.embedding.transformation.new.embedding.column' = 'column2',
'vector.embedding.transformation.api.key' = 'api-key'

Required parameters

  • name: Identifier for the Transformation
    • Must be unique for your project
    • Must start with an alphabetic character and cannot contain spaces or special characters unless the entire identifier string is enclosed in double quotes (e.g. "Flattening Transformation")
    • Case-sensitive
  • TYPE: Specifies the type of Transformation to create

Special parameters

Include special parameters after WITH as type String.

Row Filtering

  • row.filtering.transformation.field: Field name to filter on,
  • row.filtering.transformation.sql.operator: Filter function. Valid values are 'EQ', 'NEQ', 'GT', 'GEQ', 'LT', 'LEQ'.
    • EQ corresponds to Spark SQL function =
    • NEQ corresponds to Spark SQL function !=
    • GT corresponds to Spark SQL function >
    • GEQ corresponds to Spark SQL function >=
    • LT corresponds to Spark SQL function <
    • LEQ corresponds to Spark SQL function <=
  • row.filtering.transformation.value: Value to filter by.
  • row.filtering.transformation.value.type: Type of the value to filter by. Valid types are 'string', 'number'.

Derived Date Field

  • derived.date.transformation.source.field: Source field to derived date from.
  • derived.date.transformation.source.input.format: Source format of the date. Valid values are below:
    "yyyy-MM-dd",
    "yyyy-MM-dd hh:mm:ss",
    "yyyy-MM-dd HH:mm:ss",
    "yyyy-MM-dd hh:mm:ss.SSSSSS",
    "yyyy-MM-dd HH:mm:ss.SSSSSS",
    "yyyy-MM-dd hh:mm:ss.SSS",
    "yyyy-MM-dd HH:mm:ss.SSS",
    "yyyy-MM-dd'T'hh:mm:ss.SSSZ",
    "yyyy-MM-dd'T'HH:mm:ss.SSSZ",
    "yyyy-MM-dd'T'HH:mm:ss.SSSXXX",
    "yyyy-MM-dd'T'hh:mm:ss.SSSSSSZ",
    "yyyy-MM-dd'T'HH:mm:ss.SSSSSSX",
    "yyyy-MM-dd'T'HH:mm:ss.SSSSSSZ",
    "yyyy-MM-dd'T'hh:mm:ssZ",
    "yyyy-MM-dd'T'HH:mm:ssZ",
    "EPOCH_SECONDS",
    "epoch_seconds",
    "EPOCH_MILLIS",
    "epoch_millis",
    "EPOCH_MICROS",
    "epoch_micros"
    • Use single-quote to escape single-quotes. An example source input format key value in sql statement is 'derived.date.transformation.source.input.format' = 'yyyy-MM-dd''T''HH:mm:ss.SSSSSSX'.
  • derived.date.transformation.new.field: New field name in which we store derived date.
  • derived.date.transformation.output.format: Output format for derived date. Valid values are below:
    "yyyy-MM-dd",
    "yyyy-MM-dd hh:mm:ss",
    "yyyy-MM-dd HH:mm:ss",
    "yyyy-MM-dd hh:mm:ss.SSSSSS",
    "yyyy-MM-dd HH:mm:ss.SSSSSS",
    "yyyy-MM-dd hh:mm:ss.SSS",
    "yyyy-MM-dd HH:mm:ss.SSS",
    "yyyy-MM-dd'T'hh:mm:ss.SSSZ",
    "yyyy-MM-dd'T'HH:mm:ss.SSSZ",
    "yyyy-MM-dd'T'HH:mm:ss.SSSXXX",
    "yyyy-MM-dd'T'hh:mm:ss.SSSSSSZ",
    "yyyy-MM-dd'T'HH:mm:ss.SSSSSSX",
    "yyyy-MM-dd'T'HH:mm:ss.SSSSSSZ",
    "yyyy-MM-dd'T'hh:mm:ssZ",
    "yyyy-MM-dd'T'HH:mm:ssZ",
    "EPOCH_SECONDS",
    "epoch_seconds",
    "EPOCH_MILLIS",
    "epoch_millis",
    "EPOCH_MICROS",
    "epoch_micros"

Add Current Timestamp

  • current.timestamp.transformation.new.field: New field name in which we store current timestamp.
  • current.timestamp.transformation.output.format: Output format to store date. Valid values are below:
    "yyyy-MM-dd",
    "yyyy-MM-dd hh:mm:ss",
    "yyyy-MM-dd HH:mm:ss",
    "yyyy-MM-dd hh:mm:ss.SSSSSS",
    "yyyy-MM-dd HH:mm:ss.SSSSSS",
    "yyyy-MM-dd hh:mm:ss.SSS",
    "yyyy-MM-dd HH:mm:ss.SSS",
    "yyyy-MM-dd'T'hh:mm:ss.SSSZ",
    "yyyy-MM-dd'T'HH:mm:ss.SSSZ",
    "yyyy-MM-dd'T'HH:mm:ss.SSSXXX",
    "yyyy-MM-dd'T'hh:mm:ss.SSSSSSZ",
    "yyyy-MM-dd'T'HH:mm:ss.SSSSSSX",
    "yyyy-MM-dd'T'HH:mm:ss.SSSSSSZ",
    "yyyy-MM-dd'T'hh:mm:ssZ",
    "yyyy-MM-dd'T'HH:mm:ssZ",
    "EPOCH_SECONDS",
    "epoch_seconds",
    "EPOCH_MILLIS",
    "epoch_millis",
    "EPOCH_MICROS",
    "epoch_micros"

Custom Transformation

Important: In order to create Flows using custom transformations via API, you must first add the custom transformation JAR in the Onehouse console (instructions), then create the transformation via the API.

  • user.provided.transformation.class.name: Class name to use in the Transformation.
  • user.provided.transformation.properties: Properties to pass to the custom transformation. Format is
  • 'key1=value1, key2=value2,....'
  • Usage Notes
    • This class must come from the JAR file uploaded via the Onehouse console.
    • Reach out to the Onehouse team for support if you need to pass custom properties.

Convert Data from CDC Format

  • convert_cdc.cdc_format: Format of the CDC data. Specify 'postgresql', 'mongodb', or 'mysql'.
  • convert_cdc.target_schema: Required only for mongodb.

Data Masking

  • data_masking.type: Type of data masking to apply. Valid values are 'HASHING' and 'TRUNCATION'.
  • data_masking.fields: Comma separated list of fields to mask.

Parse JSON

  • l_parse_json.fields: Comma separated list of fields to parse.
  • l_parse_json.field_schema: Schema name to use for the parsed fields.
  • l_parse_json.nested_fields: Comma separated list of nested fields to parse.

Flatten Struct

  • flatten_struct.operation_mode: Mode of the transformation. Valid values are 'recursive' (default) and 'selective'.
    • recursive: Flatten all nested Structs. If flatten_struct.selected_column is not set, the entire schema is flattened from root. If a path is set, all nested Structs under that path are flattened recursively.
    • selective: Flatten only the direct children of the specified Struct one level up into its parent scope. flatten_struct.selected_column is required in this mode.
  • flatten_struct.selected_column: Dot-separated path to the target Struct (e.g. 'home' for a top-level Struct or 'home.address' for a Struct nested inside another Struct). Intermediate segments must be Struct types; navigating through Arrays is not supported. Required for selective mode, optional for recursive mode.

Explode Array

  • explode_array.operation_mode: Mode of the transformation. Valid values are 'recursive' (default) and 'selective'.
    • recursive: Explode the specified Array. When the Array's elements are Structs, Arrays inside those Structs are also exploded recursively. If explode_array.selected_column is not set, every top-level field is processed and all Arrays reachable from those fields are exploded (when multiple independent Arrays exist, the result is the Cartesian product of their elements).
    • selective: Explode only the specified Array, one level. explode_array.selected_column is required in this mode.
  • explode_array.selected_column: Dot-separated path to the target Array through Struct fields (e.g. 'scores' for a top-level Array, or 'user.scores' for an Array inside a Struct). Intermediate segments must be Struct types; navigating through Arrays is not supported. Required for selective mode, optional for recursive mode.

Generate Vector Embeddings

  • vector.embedding.transformation.model.name: Model name to use. Supported models can be found here.
  • vector.embedding.transformation.selected.column: The column in the incoming dataset to generate embeddings on.
  • vector.embedding.transformation.new.embedding.column: The name of the new embedding column that will be created.
  • vector.embedding.transformation.api.key: The API key registered with the model provider. Onehouse will use this key to access the models and generate the embeddings.