Skip to main content

CREATE TRANSFORMATION

Description

Creates a new Transformation to be used in Stream Captures.

Note that the SQL statement does not end with ;

Syntax

CREATE TRANSFORMATION <name>
TYPE = {'CONVERT_CDC'| 'ROW_FILTERING' | 'DERIVED_DATE'
|'ADD_CURRENT_TIMESTAMP' | 'USER_PROVIDED' | 'DATA_MASKING' | 'L_PARSE_JSON'
| 'FLATTEN_STRUCT' | 'EXPLODE_ARRAY' }
WITH 'key1' = 'value1', 'key2' = 'value2' ....
Note about USER_PROVIDED transformation type

The USER_PROVIDED transformation type is used to specify a custom transformation class.

The transformation name is case-agnostic.

Example

ROW FILTERING TRANSFORMATION

CREATE TRANSFORMATION <name>
TYPE = 'ROW_FILTERING'
WITH 'row.filtering.transformation.field' = 'event',
'row.filtering.transformation.sql.operator' = 'EQ',
'row.filtering.transformation.value' = 'click',
'row.filtering.transformation.value.type' = 'string'

CONVERT CDC TRANSFORMATION (MONGODB)

CREATE TRANSFORMATION <name>
TYPE = 'CONVERT_CDC'
WITH 'convert_cdc.cdc_format' = 'mongodb', 'convert_cdc.target_schema' = '<schema_name>'

CONVERT CDC TRANSFORMATION (POSTGRESQL)

CREATE TRANSFORMATION <name>
TYPE = 'CONVERT_CDC'
WITH 'convert_cdc.cdc_format' = 'postgresql'

DATA MASKING TRANSFORMATION

CREATE TRANSFORMATION <name>
TYPE = 'DATA_MASKING'
WITH 'data_masking.type' = {'HASHING'|'TRUNCATION'}, 'data_masking.fields' = 'key1, key2, key3'

PARSE JSON TRANSFORMATION

CREATE TRANSFORMATION <name>
TYPE = 'L_PARSE_JSON'
WITH 'l_parse_json.fields' = 'parent_key1, parent_key2, parent_key3',
'l_parse_json.field_schema' = '<schema_name>',
'l_parse_json.nested_fields' = 'parent_key1.nested_key1, parent_key2.nested_key2, parent_key2.nested_key3'

Sample response

GENERATE VECTOR EMBEDDINGS

CREATE TRANSFORMATION <name>
TYPE = 'VECTOR_EMBEDDING'
WITH 'vector.embedding.transformation.model.name' = 'test-model',
'vector.embedding.transformation.selected.column' = 'column1',
'vector.embedding.transformation.new.embedding.column' = 'column2',
'vector.embedding.transformation.api.key' = 'api-key'

Required parameters

  • name: Identifier for the Transformation
    • Must be unique for your account
    • Must start with an alphabetic character and cannot contain spaces or special characters unless the entire identifier string is enclosed in double quotes (e.g. "Flattening Transformation")
    • Case-sensitive
  • TYPE: Specifies the type of Transformation to create

Special parameters

Include special parameters after WITH as type String.

Row Filtering

  • row.filtering.transformation.field: Field name to filter on,
  • row.filtering.transformation.sql.operator: Filter function. Valid values are 'EQ', 'NEQ', 'GT', 'GEQ', 'LT', 'LEQ'.
    • EQ corresponds to Spark SQL function =
    • NEQ corresponds to Spark SQL function !=
    • GT corresponds to Spark SQL function >
    • GEQ corresponds to Spark SQL function >=
    • LT corresponds to Spark SQL function <
    • LEQ corresponds to Spark SQL function <=
  • row.filtering.transformation.value: Value to filter by.
  • row.filtering.transformation.value.type: Type of the value to filter by. Valid types are 'string', 'number'.

Derived Date Field

  • derived.date.transformation.source.field: Source field to derived date from.
  • derived.date.transformation.source.input.format: Source format of the date. Valid values are below:
    "yyyy-MM-dd",
    "yyyy-MM-dd hh:mm:ss",
    "yyyy-MM-dd HH:mm:ss",
    "yyyy-MM-dd hh:mm:ss.SSSSSS",
    "yyyy-MM-dd HH:mm:ss.SSSSSS",
    "yyyy-MM-dd hh:mm:ss.SSS",
    "yyyy-MM-dd HH:mm:ss.SSS",
    "yyyy-MM-dd'T'hh:mm:ss.SSSZ",
    "yyyy-MM-dd'T'HH:mm:ss.SSSZ",
    "yyyy-MM-dd'T'HH:mm:ss.SSSXXX",
    "yyyy-MM-dd'T'hh:mm:ss.SSSSSSZ",
    "yyyy-MM-dd'T'HH:mm:ss.SSSSSSX",
    "yyyy-MM-dd'T'HH:mm:ss.SSSSSSZ",
    "yyyy-MM-dd'T'hh:mm:ssZ",
    "yyyy-MM-dd'T'HH:mm:ssZ",
    "EPOCH_SECONDS",
    "epoch_seconds",
    "EPOCH_MILLIS",
    "epoch_millis",
    "EPOCH_MICROS",
    "epoch_micros"
    • Use single-quote to escape single-quotes. An example source input format key value in sql statement is 'derived.date.transformation.source.input.format' = 'yyyy-MM-dd''T''HH:mm:ss.SSSSSSX'.
  • derived.date.transformation.new.field: New field name in which we store derived date.
  • derived.date.transformation.output.format: Output format for derived date. Valid values are below:
    "yyyy-MM-dd",
    "yyyy-MM-dd hh:mm:ss",
    "yyyy-MM-dd HH:mm:ss",
    "yyyy-MM-dd hh:mm:ss.SSSSSS",
    "yyyy-MM-dd HH:mm:ss.SSSSSS",
    "yyyy-MM-dd hh:mm:ss.SSS",
    "yyyy-MM-dd HH:mm:ss.SSS",
    "yyyy-MM-dd'T'hh:mm:ss.SSSZ",
    "yyyy-MM-dd'T'HH:mm:ss.SSSZ",
    "yyyy-MM-dd'T'HH:mm:ss.SSSXXX",
    "yyyy-MM-dd'T'hh:mm:ss.SSSSSSZ",
    "yyyy-MM-dd'T'HH:mm:ss.SSSSSSX",
    "yyyy-MM-dd'T'HH:mm:ss.SSSSSSZ",
    "yyyy-MM-dd'T'hh:mm:ssZ",
    "yyyy-MM-dd'T'HH:mm:ssZ",
    "EPOCH_SECONDS",
    "epoch_seconds",
    "EPOCH_MILLIS",
    "epoch_millis",
    "EPOCH_MICROS",
    "epoch_micros"

Add Current Timestamp

  • current.timestamp.transformation.new.field: New field name in which we store current timestamp.
  • current.timestamp.transformation.output.format: Output format to store date. Valid values are below:
    "yyyy-MM-dd",
    "yyyy-MM-dd hh:mm:ss",
    "yyyy-MM-dd HH:mm:ss",
    "yyyy-MM-dd hh:mm:ss.SSSSSS",
    "yyyy-MM-dd HH:mm:ss.SSSSSS",
    "yyyy-MM-dd hh:mm:ss.SSS",
    "yyyy-MM-dd HH:mm:ss.SSS",
    "yyyy-MM-dd'T'hh:mm:ss.SSSZ",
    "yyyy-MM-dd'T'HH:mm:ss.SSSZ",
    "yyyy-MM-dd'T'HH:mm:ss.SSSXXX",
    "yyyy-MM-dd'T'hh:mm:ss.SSSSSSZ",
    "yyyy-MM-dd'T'HH:mm:ss.SSSSSSX",
    "yyyy-MM-dd'T'HH:mm:ss.SSSSSSZ",
    "yyyy-MM-dd'T'hh:mm:ssZ",
    "yyyy-MM-dd'T'HH:mm:ssZ",
    "EPOCH_SECONDS",
    "epoch_seconds",
    "EPOCH_MILLIS",
    "epoch_millis",
    "EPOCH_MICROS",
    "epoch_micros"

Custom Transformation

Important: In order to create Stream Captures using custom transformations via API, you must first add the custom transformation JAR in the Onehouse console (instructions), then create the transformation via the API.

  • user.provided.transformation.class.name: Class name to use in the Transformation.
  • user.provided.transformation.properties: Properties to pass to the custom transformation. Format is
  • 'key1=value1, key2=value2,....'
  • Usage Notes
    • This class must come from the JAR file uploaded via the Onehouse console.
    • Reach out to the Onehouse team for support if you need to pass custom properties.

Convert Data from CDC Format

  • convert_cdc.cdc_format: Format of the CDC data. Specify 'postgresql', 'mongodb', or 'mysql'.
  • convert_cdc.target_schema: Required only for mongodb.

Data Masking

  • data_masking.type: Type of data masking to apply. Valid values are 'HASHING' and 'TRUNCATION'.
  • data_masking.fields: Comma separated list of fields to mask.

Parse JSON

  • l_parse_json.fields: Comma separated list of fields to parse.
  • l_parse_json.field_schema: Schema name to use for the parsed fields.
  • l_parse_json.nested_fields: Comma separated list of nested fields to parse.

Flatten Struct

  • No special parameters required, Operates in recursive mode by default.

Explode Array

  • No special parameters required, Operates in recursive mode by default.

Generate Vector Embeddings

  • vector.embedding.transformation.model.name: Model name to use. Supported models can be found here.
  • vector.embedding.transformation.selected.column: The column in the incoming dataset to generate embeddings on.
  • vector.embedding.transformation.new.embedding.column: The name of the new embedding column that will be created.
  • vector.embedding.transformation.api.key: The API key registered with the model provider. Onehouse will use this key to access the models and generate the embeddings.