CREATE TRANSFORMATION

Description

Creates a new Transformation to be used in Stream Captures.

Note that the SQL statement does not end with ;

Syntax

CREATE TRANSFORMATION <name>
    TYPE = {'CONVERT_CDC'| 'ROW_FILTERING' | 'DERIVED_DATE'
             |'ADD_CURRENT_TIMESTAMP' | 'USER_PROVIDED' | 'DATA_MASKING' | 'L_PARSE_JSON'
             | 'FLATTEN_STRUCT' | 'EXPLODE_ARRAY' }
WITH 'key1' = 'value1', 'key2' = 'value2' ....

Note about USER_PROVIDED transformation type

The USER_PROVIDED transformation type is used to specify a custom transformation class.

The transformation name is case-agnostic.

Example

ROW FILTERING TRANSFORMATION

CREATE TRANSFORMATION <name>
    TYPE = 'ROW_FILTERING'
WITH 'row.filtering.transformation.field' = 'event',
     'row.filtering.transformation.sql.operator' = 'EQ',
     'row.filtering.transformation.value' = 'click',
     'row.filtering.transformation.value.type' = 'string'

CONVERT CDC TRANSFORMATION (MONGODB)

CREATE TRANSFORMATION <name>
    TYPE = 'CONVERT_CDC'
WITH 'convert_cdc.cdc_format' = 'mongodb', 'convert_cdc.target_schema' = '<schema_name>'

CONVERT CDC TRANSFORMATION (POSTGRESQL)

CREATE TRANSFORMATION <name>
    TYPE = 'CONVERT_CDC'
WITH 'convert_cdc.cdc_format' = 'postgresql'

DATA MASKING TRANSFORMATION

CREATE TRANSFORMATION <name>
    TYPE = 'DATA_MASKING'
WITH 'data_masking.type' = {'HASHING'|'TRUNCATION'}, 'data_masking.fields' = 'key1, key2, key3'    

PARSE JSON TRANSFORMATION

CREATE TRANSFORMATION <name>
    TYPE = 'L_PARSE_JSON'
WITH 'l_parse_json.fields' = 'parent_key1, parent_key2, parent_key3',
     'l_parse_json.field_schema' = '<schema_name>', 
     'l_parse_json.nested_fields' = 'parent_key1.nested_key1, parent_key2.nested_key2, parent_key2.nested_key3'

Sample response

GENERATE VECTOR EMBEDDINGS

CREATE TRANSFORMATION <name>
    TYPE = 'VECTOR_EMBEDDING'
WITH 'vector.embedding.transformation.model.name' = 'test-model',
     'vector.embedding.transformation.selected.column' = 'column1',
     'vector.embedding.transformation.new.embedding.column' = 'column2',
     'vector.embedding.transformation.api.key' = 'api-key'

Required parameters

name: Identifier for the Transformation
- Must be unique for your account
- Must start with an alphabetic character and cannot contain spaces or special characters unless the entire identifier string is enclosed in double quotes (e.g. "Flattening Transformation")
- Case-sensitive
TYPE: Specifies the type of Transformation to create

Special parameters

Include special parameters after WITH as type String.

Row Filtering

row.filtering.transformation.field: Field name to filter on,
row.filtering.transformation.sql.operator: Filter function. Valid values are 'EQ', 'NEQ', 'GT', 'GEQ', 'LT', 'LEQ'.
- EQ corresponds to Spark SQL function =
- NEQ corresponds to Spark SQL function !=
- GT corresponds to Spark SQL function >
- GEQ corresponds to Spark SQL function >=
- LT corresponds to Spark SQL function <
- LEQ corresponds to Spark SQL function <=
row.filtering.transformation.value: Value to filter by.
row.filtering.transformation.value.type: Type of the value to filter by. Valid types are 'string', 'number'.

Derived Date Field

derived.date.transformation.source.field: Source field to derived date from.

derived.date.transformation.source.input.format: Source format of the date. Valid values are below:

"yyyy-MM-dd",
"yyyy-MM-dd hh:mm:ss",
"yyyy-MM-dd HH:mm:ss",
"yyyy-MM-dd hh:mm:ss.SSSSSS",
"yyyy-MM-dd HH:mm:ss.SSSSSS",
"yyyy-MM-dd hh:mm:ss.SSS",
"yyyy-MM-dd HH:mm:ss.SSS",
"yyyy-MM-dd'T'hh:mm:ss.SSSZ",
"yyyy-MM-dd'T'HH:mm:ss.SSSZ",
"yyyy-MM-dd'T'HH:mm:ss.SSSXXX",
"yyyy-MM-dd'T'hh:mm:ss.SSSSSSZ",
"yyyy-MM-dd'T'HH:mm:ss.SSSSSSX",
"yyyy-MM-dd'T'HH:mm:ss.SSSSSSZ",
"yyyy-MM-dd'T'hh:mm:ssZ",
"yyyy-MM-dd'T'HH:mm:ssZ",
"EPOCH_SECONDS",
"epoch_seconds",
"EPOCH_MILLIS",
"epoch_millis",
"EPOCH_MICROS",
"epoch_micros"

Use single-quote to escape single-quotes. An example source input format key value in sql statement is 'derived.date.transformation.source.input.format' = 'yyyy-MM-dd''T''HH:mm:ss.SSSSSSX'.

derived.date.transformation.new.field: New field name in which we store derived date.

derived.date.transformation.output.format: Output format for derived date. Valid values are below:

"yyyy-MM-dd",
"yyyy-MM-dd hh:mm:ss",
"yyyy-MM-dd HH:mm:ss",
"yyyy-MM-dd hh:mm:ss.SSSSSS",
"yyyy-MM-dd HH:mm:ss.SSSSSS",
"yyyy-MM-dd hh:mm:ss.SSS",
"yyyy-MM-dd HH:mm:ss.SSS",
"yyyy-MM-dd'T'hh:mm:ss.SSSZ",
"yyyy-MM-dd'T'HH:mm:ss.SSSZ",
"yyyy-MM-dd'T'HH:mm:ss.SSSXXX",
"yyyy-MM-dd'T'hh:mm:ss.SSSSSSZ",
"yyyy-MM-dd'T'HH:mm:ss.SSSSSSX",
"yyyy-MM-dd'T'HH:mm:ss.SSSSSSZ",
"yyyy-MM-dd'T'hh:mm:ssZ",
"yyyy-MM-dd'T'HH:mm:ssZ",
"EPOCH_SECONDS",
"epoch_seconds",
"EPOCH_MILLIS",
"epoch_millis",
"EPOCH_MICROS",
"epoch_micros"

Add Current Timestamp

current.timestamp.transformation.new.field: New field name in which we store current timestamp.

current.timestamp.transformation.output.format: Output format to store date. Valid values are below:

"yyyy-MM-dd",
"yyyy-MM-dd hh:mm:ss",
"yyyy-MM-dd HH:mm:ss",
"yyyy-MM-dd hh:mm:ss.SSSSSS",
"yyyy-MM-dd HH:mm:ss.SSSSSS",
"yyyy-MM-dd hh:mm:ss.SSS",
"yyyy-MM-dd HH:mm:ss.SSS",
"yyyy-MM-dd'T'hh:mm:ss.SSSZ",
"yyyy-MM-dd'T'HH:mm:ss.SSSZ",
"yyyy-MM-dd'T'HH:mm:ss.SSSXXX",
"yyyy-MM-dd'T'hh:mm:ss.SSSSSSZ",
"yyyy-MM-dd'T'HH:mm:ss.SSSSSSX",
"yyyy-MM-dd'T'HH:mm:ss.SSSSSSZ",
"yyyy-MM-dd'T'hh:mm:ssZ",
"yyyy-MM-dd'T'HH:mm:ssZ",
"EPOCH_SECONDS",
"epoch_seconds",
"EPOCH_MILLIS",
"epoch_millis",
"EPOCH_MICROS",
"epoch_micros"

Custom Transformation

Important: In order to create Stream Captures using custom transformations via API, you must first add the custom transformation JAR in the Onehouse console (instructions), then create the transformation via the API.

user.provided.transformation.class.name: Class name to use in the Transformation.
user.provided.transformation.properties: Properties to pass to the custom transformation. Format is
'key1=value1, key2=value2,....'
Usage Notes
- This class must come from the JAR file uploaded via the Onehouse console.
- Reach out to the Onehouse team for support if you need to pass custom properties.

Convert Data from CDC Format

convert_cdc.cdc_format: Format of the CDC data. Specify 'postgresql', 'mongodb', or 'mysql'.
convert_cdc.target_schema: Required only for mongodb.

Data Masking

data_masking.type: Type of data masking to apply. Valid values are 'HASHING' and 'TRUNCATION'.
data_masking.fields: Comma separated list of fields to mask.

Parse JSON

l_parse_json.fields: Comma separated list of fields to parse.
l_parse_json.field_schema: Schema name to use for the parsed fields.
l_parse_json.nested_fields: Comma separated list of nested fields to parse.

Flatten Struct

No special parameters required, Operates in recursive mode by default.

Explode Array

No special parameters required, Operates in recursive mode by default.

Generate Vector Embeddings

vector.embedding.transformation.model.name: Model name to use. Supported models can be found here.
vector.embedding.transformation.selected.column: The column in the incoming dataset to generate embeddings on.
vector.embedding.transformation.new.embedding.column: The name of the new embedding column that will be created.
vector.embedding.transformation.api.key: The API key registered with the model provider. Onehouse will use this key to access the models and generate the embeddings.

Description​

Syntax​

Example​

ROW FILTERING TRANSFORMATION

CONVERT CDC TRANSFORMATION (MONGODB)

CONVERT CDC TRANSFORMATION (POSTGRESQL)

DATA MASKING TRANSFORMATION

PARSE JSON TRANSFORMATION

GENERATE VECTOR EMBEDDINGS

Required parameters​

Special parameters​

Row Filtering​

Derived Date Field​

Add Current Timestamp​

Custom Transformation​

Convert Data from CDC Format​

Data Masking​

Parse JSON​

Flatten Struct​

Explode Array​

Generate Vector Embeddings​