CREATE TRANSFORMATION
Description
Creates a new Transformation to be used in Flows.
Note that the SQL statement does not end with ;
Syntax
CREATE TRANSFORMATION `<name>`
TYPE = {'CONVERT_CDC'| 'ROW_FILTERING' | 'DERIVED_DATE'
|'ADD_CURRENT_TIMESTAMP' | 'USER_PROVIDED' | 'DATA_MASKING' | 'L_PARSE_JSON'
| 'FLATTEN_STRUCT' | 'EXPLODE_ARRAY' }
WITH 'key1' = 'value1', 'key2' = 'value2' ....
Note about USER_PROVIDED transformation type
The USER_PROVIDED transformation type is used to specify a custom transformation class.
The transformation name is case-agnostic.
Example
ROW FILTERING TRANSFORMATION
CREATE TRANSFORMATION `<name>`
TYPE = 'ROW_FILTERING'
WITH 'row.filtering.transformation.field' = 'event',
'row.filtering.transformation.sql.operator' = 'EQ',
'row.filtering.transformation.value' = 'click',
'row.filtering.transformation.value.type' = 'string'
CONVERT CDC TRANSFORMATION (MONGODB)
CREATE TRANSFORMATION `<name>`
TYPE = 'CONVERT_CDC'
WITH 'convert_cdc.cdc_format' = 'mongodb', 'convert_cdc.target_schema' = '<schema_name>'
CONVERT CDC TRANSFORMATION (POSTGRESQL)
CREATE TRANSFORMATION `<name>`
TYPE = 'CONVERT_CDC'
WITH 'convert_cdc.cdc_format' = 'postgresql'
DATA MASKING TRANSFORMATION
CREATE TRANSFORMATION `<name>`
TYPE = 'DATA_MASKING'
WITH 'data_masking.type' = {'HASHING'|'TRUNCATION'}, 'data_masking.fields' = 'key1, key2, key3'
PARSE JSON TRANSFORMATION
CREATE TRANSFORMATION `<name>`
TYPE = 'L_PARSE_JSON'
WITH 'l_parse_json.fields' = 'parent_key1, parent_key2, parent_key3',
'l_parse_json.field_schema' = '<schema_name>',
'l_parse_json.nested_fields' = 'parent_key1.nested_key1, parent_key2.nested_key2, parent_key2.nested_key3'
Sample response
GENERATE VECTOR EMBEDDINGS
CREATE TRANSFORMATION `<name>`
TYPE = 'VECTOR_EMBEDDING'
WITH 'vector.embedding.transformation.model.name' = 'test-model',
'vector.embedding.transformation.selected.column' = 'column1',
'vector.embedding.transformation.new.embedding.column' = 'column2',
'vector.embedding.transformation.api.key' = 'api-key'
Required parameters
name: Identifier for the Transformation- Must be unique for your project
- Must start with an alphabetic character and cannot contain spaces or special characters unless the entire identifier string is enclosed in double quotes (e.g. "Flattening Transformation")
- Case-sensitive
TYPE: Specifies the type of Transformation to create
Special parameters
Include special parameters after WITH as type String.
Row Filtering
row.filtering.transformation.field: Field name to filter on,row.filtering.transformation.sql.operator: Filter function. Valid values are'EQ', 'NEQ', 'GT', 'GEQ', 'LT', 'LEQ'.EQcorresponds to Spark SQL function=NEQcorresponds to Spark SQL function!=GTcorresponds to Spark SQL function>GEQcorresponds to Spark SQL function>=LTcorresponds to Spark SQL function<LEQcorresponds to Spark SQL function<=
row.filtering.transformation.value: Value to filter by.row.filtering.transformation.value.type: Type of the value to filter by. Valid types are'string', 'number'.
Derived Date Field
derived.date.transformation.source.field: Source field to derived date from.derived.date.transformation.source.input.format: Source format of the date. Valid values are below:"yyyy-MM-dd",
"yyyy-MM-dd hh:mm:ss",
"yyyy-MM-dd HH:mm:ss",
"yyyy-MM-dd hh:mm:ss.SSSSSS",
"yyyy-MM-dd HH:mm:ss.SSSSSS",
"yyyy-MM-dd hh:mm:ss.SSS",
"yyyy-MM-dd HH:mm:ss.SSS",
"yyyy-MM-dd'T'hh:mm:ss.SSSZ",
"yyyy-MM-dd'T'HH:mm:ss.SSSZ",
"yyyy-MM-dd'T'HH:mm:ss.SSSXXX",
"yyyy-MM-dd'T'hh:mm:ss.SSSSSSZ",
"yyyy-MM-dd'T'HH:mm:ss.SSSSSSX",
"yyyy-MM-dd'T'HH:mm:ss.SSSSSSZ",
"yyyy-MM-dd'T'hh:mm:ssZ",
"yyyy-MM-dd'T'HH:mm:ssZ",
"EPOCH_SECONDS",
"epoch_seconds",
"EPOCH_MILLIS",
"epoch_millis",
"EPOCH_MICROS",
"epoch_micros"- Use single-quote to escape single-quotes. An example source input format key value in sql statement is
'derived.date.transformation.source.input.format' = 'yyyy-MM-dd''T''HH:mm:ss.SSSSSSX'.
- Use single-quote to escape single-quotes. An example source input format key value in sql statement is
derived.date.transformation.new.field: New field name in which we store derived date.derived.date.transformation.output.format: Output format for derived date. Valid values are below:"yyyy-MM-dd",
"yyyy-MM-dd hh:mm:ss",
"yyyy-MM-dd HH:mm:ss",
"yyyy-MM-dd hh:mm:ss.SSSSSS",
"yyyy-MM-dd HH:mm:ss.SSSSSS",
"yyyy-MM-dd hh:mm:ss.SSS",
"yyyy-MM-dd HH:mm:ss.SSS",
"yyyy-MM-dd'T'hh:mm:ss.SSSZ",
"yyyy-MM-dd'T'HH:mm:ss.SSSZ",
"yyyy-MM-dd'T'HH:mm:ss.SSSXXX",
"yyyy-MM-dd'T'hh:mm:ss.SSSSSSZ",
"yyyy-MM-dd'T'HH:mm:ss.SSSSSSX",
"yyyy-MM-dd'T'HH:mm:ss.SSSSSSZ",
"yyyy-MM-dd'T'hh:mm:ssZ",
"yyyy-MM-dd'T'HH:mm:ssZ",
"EPOCH_SECONDS",
"epoch_seconds",
"EPOCH_MILLIS",
"epoch_millis",
"EPOCH_MICROS",
"epoch_micros"
Add Current Timestamp
current.timestamp.transformation.new.field: New field name in which we store current timestamp.current.timestamp.transformation.output.format: Output format to store date. Valid values are below:"yyyy-MM-dd",
"yyyy-MM-dd hh:mm:ss",
"yyyy-MM-dd HH:mm:ss",
"yyyy-MM-dd hh:mm:ss.SSSSSS",
"yyyy-MM-dd HH:mm:ss.SSSSSS",
"yyyy-MM-dd hh:mm:ss.SSS",
"yyyy-MM-dd HH:mm:ss.SSS",
"yyyy-MM-dd'T'hh:mm:ss.SSSZ",
"yyyy-MM-dd'T'HH:mm:ss.SSSZ",
"yyyy-MM-dd'T'HH:mm:ss.SSSXXX",
"yyyy-MM-dd'T'hh:mm:ss.SSSSSSZ",
"yyyy-MM-dd'T'HH:mm:ss.SSSSSSX",
"yyyy-MM-dd'T'HH:mm:ss.SSSSSSZ",
"yyyy-MM-dd'T'hh:mm:ssZ",
"yyyy-MM-dd'T'HH:mm:ssZ",
"EPOCH_SECONDS",
"epoch_seconds",
"EPOCH_MILLIS",
"epoch_millis",
"EPOCH_MICROS",
"epoch_micros"
Custom Transformation
Important: In order to create Flows using custom transformations via API, you must first add the custom transformation JAR in the Onehouse console (instructions), then create the transformation via the API.
user.provided.transformation.class.name: Class name to use in the Transformation.user.provided.transformation.properties: Properties to pass to the custom transformation. Format is- 'key1=value1, key2=value2,....'
- Usage Notes
- This class must come from the JAR file uploaded via the Onehouse console.
- Reach out to the Onehouse team for support if you need to pass custom properties.
Convert Data from CDC Format
convert_cdc.cdc_format: Format of the CDC data. Specify'postgresql','mongodb', or'mysql'.convert_cdc.target_schema: Required only for mongodb.
Data Masking
data_masking.type: Type of data masking to apply. Valid values are'HASHING'and'TRUNCATION'.data_masking.fields: Comma separated list of fields to mask.
Parse JSON
l_parse_json.fields: Comma separated list of fields to parse.l_parse_json.field_schema: Schema name to use for the parsed fields.l_parse_json.nested_fields: Comma separated list of nested fields to parse.
Flatten Struct
- No special parameters required, Operates in recursive mode by default.
Explode Array
- No special parameters required, Operates in recursive mode by default.
Generate Vector Embeddings
vector.embedding.transformation.model.name: Model name to use. Supported models can be found here.vector.embedding.transformation.selected.column: The column in the incoming dataset to generate embeddings on.vector.embedding.transformation.new.embedding.column: The name of the new embedding column that will be created.vector.embedding.transformation.api.key: The API key registered with the model provider. Onehouse will use this key to access the models and generate the embeddings.