CREATE TRANSFORMATION
Description
Creates a new Transformation to be used in Stream Captures.
Note that the SQL statement does not end with ;
Syntax
CREATE TRANSFORMATION <name>
TYPE = {'CONVERT_CDC'| 'ROW_FILTERING' | 'DERIVED_DATE'
|'ADD_CURRENT_TIMESTAMP' | 'USER_PROVIDED' | 'DATA_MASKING' | 'L_PARSE_JSON'
| 'FLATTEN_STRUCT' | 'EXPLODE_ARRAY' }
WITH 'key1' = 'value1', 'key2' = 'value2' ....
Note about USER_PROVIDED transformation type
The USER_PROVIDED
transformation type is used to specify a custom transformation class.
The transformation name is case-agnostic.
Example
ROW FILTERING TRANSFORMATION
CREATE TRANSFORMATION <name>
TYPE = 'ROW_FILTERING'
WITH 'row.filtering.transformation.field' = 'event',
'row.filtering.transformation.sql.operator' = 'EQ',
'row.filtering.transformation.value' = 'click',
'row.filtering.transformation.value.type' = 'string'
CONVERT CDC TRANSFORMATION (MONGODB)
CREATE TRANSFORMATION <name>
TYPE = 'CONVERT_CDC'
WITH 'convert_cdc.cdc_format' = 'mongodb', 'convert_cdc.target_schema' = '<schema_name>'
CONVERT CDC TRANSFORMATION (POSTGRESQL)
CREATE TRANSFORMATION <name>
TYPE = 'CONVERT_CDC'
WITH 'convert_cdc.cdc_format' = 'postgresql'
DATA MASKING TRANSFORMATION
CREATE TRANSFORMATION <name>
TYPE = 'DATA_MASKING'
WITH 'data_masking.type' = {'HASHING'|'TRUNCATION'}, 'data_masking.fields' = 'key1, key2, key3'
PARSE JSON TRANSFORMATION
CREATE TRANSFORMATION <name>
TYPE = 'L_PARSE_JSON'
WITH 'l_parse_json.fields' = 'parent_key1, parent_key2, parent_key3',
'l_parse_json.field_schema' = '<schema_name>',
'l_parse_json.nested_fields' = 'parent_key1.nested_key1, parent_key2.nested_key2, parent_key2.nested_key3'
Sample response
GENERATE VECTOR EMBEDDINGS
CREATE TRANSFORMATION <name>
TYPE = 'VECTOR_EMBEDDING'
WITH 'vector.embedding.transformation.model.name' = 'test-model',
'vector.embedding.transformation.selected.column' = 'column1',
'vector.embedding.transformation.new.embedding.column' = 'column2',
'vector.embedding.transformation.api.key' = 'api-key'
Required parameters
name
: Identifier for the Transformation- Must be unique for your account
- Must start with an alphabetic character and cannot contain spaces or special characters unless the entire identifier string is enclosed in double quotes (e.g. "Flattening Transformation")
- Case-sensitive
TYPE
: Specifies the type of Transformation to create
Special parameters
Include special parameters after WITH
as type String.
Row Filtering
row.filtering.transformation.field
: Field name to filter on,row.filtering.transformation.sql.operator
: Filter function. Valid values are'EQ', 'NEQ', 'GT', 'GEQ', 'LT', 'LEQ'
.EQ
corresponds to Spark SQL function=
NEQ
corresponds to Spark SQL function!=
GT
corresponds to Spark SQL function>
GEQ
corresponds to Spark SQL function>=
LT
corresponds to Spark SQL function<
LEQ
corresponds to Spark SQL function<=
row.filtering.transformation.value
: Value to filter by.row.filtering.transformation.value.type
: Type of the value to filter by. Valid types are'string', 'number'
.
Derived Date Field
derived.date.transformation.source.field
: Source field to derived date from.derived.date.transformation.source.input.format
: Source format of the date. Valid values are below:"yyyy-MM-dd",
"yyyy-MM-dd hh:mm:ss",
"yyyy-MM-dd HH:mm:ss",
"yyyy-MM-dd hh:mm:ss.SSSSSS",
"yyyy-MM-dd HH:mm:ss.SSSSSS",
"yyyy-MM-dd hh:mm:ss.SSS",
"yyyy-MM-dd HH:mm:ss.SSS",
"yyyy-MM-dd'T'hh:mm:ss.SSSZ",
"yyyy-MM-dd'T'HH:mm:ss.SSSZ",
"yyyy-MM-dd'T'HH:mm:ss.SSSXXX",
"yyyy-MM-dd'T'hh:mm:ss.SSSSSSZ",
"yyyy-MM-dd'T'HH:mm:ss.SSSSSSX",
"yyyy-MM-dd'T'HH:mm:ss.SSSSSSZ",
"yyyy-MM-dd'T'hh:mm:ssZ",
"yyyy-MM-dd'T'HH:mm:ssZ",
"EPOCH_SECONDS",
"epoch_seconds",
"EPOCH_MILLIS",
"epoch_millis",
"EPOCH_MICROS",
"epoch_micros"- Use single-quote to escape single-quotes. An example source input format key value in sql statement is
'derived.date.transformation.source.input.format' = 'yyyy-MM-dd''T''HH:mm:ss.SSSSSSX'
.
- Use single-quote to escape single-quotes. An example source input format key value in sql statement is
derived.date.transformation.new.field
: New field name in which we store derived date.derived.date.transformation.output.format
: Output format for derived date. Valid values are below:"yyyy-MM-dd",
"yyyy-MM-dd hh:mm:ss",
"yyyy-MM-dd HH:mm:ss",
"yyyy-MM-dd hh:mm:ss.SSSSSS",
"yyyy-MM-dd HH:mm:ss.SSSSSS",
"yyyy-MM-dd hh:mm:ss.SSS",
"yyyy-MM-dd HH:mm:ss.SSS",
"yyyy-MM-dd'T'hh:mm:ss.SSSZ",
"yyyy-MM-dd'T'HH:mm:ss.SSSZ",
"yyyy-MM-dd'T'HH:mm:ss.SSSXXX",
"yyyy-MM-dd'T'hh:mm:ss.SSSSSSZ",
"yyyy-MM-dd'T'HH:mm:ss.SSSSSSX",
"yyyy-MM-dd'T'HH:mm:ss.SSSSSSZ",
"yyyy-MM-dd'T'hh:mm:ssZ",
"yyyy-MM-dd'T'HH:mm:ssZ",
"EPOCH_SECONDS",
"epoch_seconds",
"EPOCH_MILLIS",
"epoch_millis",
"EPOCH_MICROS",
"epoch_micros"
Add Current Timestamp
current.timestamp.transformation.new.field
: New field name in which we store current timestamp.current.timestamp.transformation.output.format
: Output format to store date. Valid values are below:"yyyy-MM-dd",
"yyyy-MM-dd hh:mm:ss",
"yyyy-MM-dd HH:mm:ss",
"yyyy-MM-dd hh:mm:ss.SSSSSS",
"yyyy-MM-dd HH:mm:ss.SSSSSS",
"yyyy-MM-dd hh:mm:ss.SSS",
"yyyy-MM-dd HH:mm:ss.SSS",
"yyyy-MM-dd'T'hh:mm:ss.SSSZ",
"yyyy-MM-dd'T'HH:mm:ss.SSSZ",
"yyyy-MM-dd'T'HH:mm:ss.SSSXXX",
"yyyy-MM-dd'T'hh:mm:ss.SSSSSSZ",
"yyyy-MM-dd'T'HH:mm:ss.SSSSSSX",
"yyyy-MM-dd'T'HH:mm:ss.SSSSSSZ",
"yyyy-MM-dd'T'hh:mm:ssZ",
"yyyy-MM-dd'T'HH:mm:ssZ",
"EPOCH_SECONDS",
"epoch_seconds",
"EPOCH_MILLIS",
"epoch_millis",
"EPOCH_MICROS",
"epoch_micros"
Custom Transformation
Important: In order to create Stream Captures using custom transformations via API, you must first add the custom transformation JAR in the Onehouse console (instructions), then create the transformation via the API.
user.provided.transformation.class.name
: Class name to use in the Transformation.user.provided.transformation.properties
: Properties to pass to the custom transformation. Format is- 'key1=value1, key2=value2,....'
- Usage Notes
- This class must come from the JAR file uploaded via the Onehouse console.
- Reach out to the Onehouse team for support if you need to pass custom properties.
Convert Data from CDC Format
convert_cdc.cdc_format
: Format of the CDC data. Specify'postgresql'
,'mongodb'
, or'mysql'
.convert_cdc.target_schema
: Required only for mongodb.
Data Masking
data_masking.type
: Type of data masking to apply. Valid values are'HASHING'
and'TRUNCATION'
.data_masking.fields
: Comma separated list of fields to mask.
Parse JSON
l_parse_json.fields
: Comma separated list of fields to parse.l_parse_json.field_schema
: Schema name to use for the parsed fields.l_parse_json.nested_fields
: Comma separated list of nested fields to parse.
Flatten Struct
- No special parameters required, Operates in recursive mode by default.
Explode Array
- No special parameters required, Operates in recursive mode by default.
Generate Vector Embeddings
vector.embedding.transformation.model.name
: Model name to use. Supported models can be found here.vector.embedding.transformation.selected.column
: The column in the incoming dataset to generate embeddings on.vector.embedding.transformation.new.embedding.column
: The name of the new embedding column that will be created.vector.embedding.transformation.api.key
: The API key registered with the model provider. Onehouse will use this key to access the models and generate the embeddings.