Skip to main content

Add Derived Date

Description

Adds a new field that converts an existing Date field to a String with a custom format.

Parameters

  • New Field Name: Specify the name of the new field to generate. This field name must not already exist in the schema.
  • New Field Format: Specify the date format to generate.
  • Source Field Name: Specify the name of an existing field from which to derive a new date. This field name must exist in the schema.
  • Source Field Format: Specify the date format of the source field. The expected data type for your source field will depend on the format you select here.

Input Requirements

Source field must be a certain type, depending on the Source Field Format specified:

  • Long type for epoch formats
  • String type for other formats (eg. "yyyy-MM-dd")

Here's a detailed explanation of the supported formats:

Source Input FormatDescriptionExample
Epoch SecondsThis is the number of seconds that have elapsed since January 1, 1970 (midnight UTC/GMT), not counting leap seconds (also known as Unix time or POSIX time).1704408001 which represents 2024-01-04 22:40:01 UTC.
Epoch MillisecondsSimilar to epoch seconds, but in milliseconds.1704408001000 which represents 2024-01-04 22:40:01 UTC.
Epoch MicrosecondsThis is the Unix time in microseconds.1704408001000000 which represents 2024-01-04 22:40:01 UTC.
yyyy-MM-ddThis represents year, month, and day.2024-01-04
yyyy-MM-dd HH:mm:ssThis represents a date and time, using a 24-hour clock.2024-01-04 14:40:01
yyyy-MM-dd hh:mm:ssThis represents a date and time, using a 12-hour clock.2024-01-04 02:40:01
yyyy-MM-dd hh:mm:ss.SSSSSThis represents date and time with fractional seconds (up to five decimal places).2024-01-04 02:40:01.00000
yyyy-MM-dd HH:mm:ss.SSSSSThis represents date and time with fractional seconds (up to five decimal places), using a 24-hour clock.2024-01-04 14:40:01.00000
yyyy-MM-dd hh:mm:ss.SSSThis represents date and time with milliseconds, using a 12-hour clock.2024-01-04 02:40:01.000
yyyy-MM-dd HH:mm:ss.SSSThis represents date and time with milliseconds, using a 24-hour clock.2024-01-04 14:40:01.000
yyyy-MM-dd'T'HH:mm:ss.SSSXXXThis format includes timezone information with a timezone offset, using a 24-hour clock with 3 digit (milli) fraction-of-second precision including the offset hour and minute with a colon.2024-01-04T14:40:01.000-08:00 (Pacific Time)
yyyy-MM-dd'T'hh:mm:ss.SSSZThis format includes timezone information with a timezone offset, using a 12-hour clock with 3 digit (milli) fraction-of-second precision including the offset hour and minute without a colon.2024-01-04T02:40:01.000-0800 (Pacific Time)
yyyy-MM-dd'T'HH:mm:ss.SSSZThis format includes timezone information with a timezone offset, using a 24-hour clock with 3 digit (milli) fraction-of-second precision including the offset hour and minute without a colon.2024-01-04T14:40:01.000-0800 (Pacific Time)
yyyy-MM-dd'T'hh:mm:ss.SSSSSSZThis format includes timezone information with a timezone offset, using a 12-hour clock with 6 digit (micro) fraction-of-second precision including the offset hour and minute without a colon.2024-01-04T02:40:01.000000-0800 (Pacific Time)
yyyy-MM-dd'T'HH:mm:ss.SSSSSSXThis format includes timezone information with a timezone offset, using a 24-hour clock with 6 digit (micro) fraction-of-second precision including the offset hour and minute with a colon.2024-01-04T14:40:01.000000Z (UTC)
yyyy-MM-dd'T'HH:mm:ss.SSSSSSZThis format includes timezone information with a timezone offset, using a 24-hour clock with 6 digit (micro) fraction-of-second precision including the offset hour and minute without a colon.2024-01-04T14:40:01.000000-0800 (Pacific Time)
yyyy-MM-dd'T'hh:mm:ssZThis format includes timezone information with a timezone offset, using a 12-hour clock at a 2 digit (centi) fraction-of-second precision including the offset hour and minute without a colon.2024-01-04T02:40:01.00-0800 (Pacific Time)
yyyy-MM-dd'T'HH:mm:ssZThis format includes timezone information with a timezone offset, using a 24-hour clock at a 2 digit (centi) fraction-of-second precision including the offset hour and minute without a colon.2024-01-04T14:40:01.00-0800 (Pacific Time)

Referenced from Apache Spark SQL Date and Time Functions

Expected Output

  • Adds a field of type String.
  • The Source Field remains in the record.
  • For Epoch formats, the output will be in the UTC timezone.

Example

// Input
// Source Field = "purchase_date"
// Source Format = "yyyy-MM-dd"
// New Field Name = "purchase_month"
// Output Format = "yyyy-MM"
{
"id": 1,
"event_name": "checkout",
"purchase_date": 2023-04-21
}

// Output
{
"id": 1,
"event_name": "checkout",
"purchase_date": 2023-04-21,
"purchase_month": "2023-04"
}