Skip to main content

Mask Data

Description

Mask data for privacy by hashing or truncating a field. Supports the following masking types:

  • Hash: Hashes all values using xxHash64.
  • Truncate: Sets all values to NULL.

Parameters

  • Fields to Mask: Specify a comma-separated list of fields to mask.
  • Masking Type: Specify the type of masking to apply.

Input Requirements

Input schema may be any format.

Expected Output

Masked fields' types will remain the same as the input and will be nullable.

Usage Notes

  • Hashing will produce a 64-bit hash value. If the hash value is incompatible with the input format, the pipeline will throw an error. Ensure you only hash fields with types that can hold a 64-bit value, such as String, Long, or Double.

Examples

Truncate

// Input
// Fields to Mask = "ssn"
// Masking Type = Truncate
{
"name": "Bill",
"ssn": "123-45-6789"
}

// Output
{
"name": "Bill",
"ssn": null
}

Hash

// Input
// Fields to Mask = "ssn"
// Masking Type = Hash
{
"name": "Bill",
"ssn": "123-45-6789"
}

// Output
{
"name": "Bill",
"ssn": "f52b16bf"
}