Add Derived Date
Description
Adds a new field that converts an existing Date field to a String with a custom format.
Parameters
- New Field Name: Specify the name of the new field to generate. This field name must not already exist in the schema.
- New Field Format: Specify the date format to generate.
- Source Field Name: Specify the name of an existing field from which to derive a new date. This field name must exist in the schema.
- Source Field Format: Specify the date format of the source field. The expected data type for your source field will depend on the format you select here.
Input Requirements
Source field must be a certain type, depending on the Source Field Format specified:
- Long type for epoch formats
- String type for other formats (eg. "yyyy-MM-dd")
Here's a detailed explanation of the supported formats:
Source Input Format | Description | Example |
---|---|---|
Epoch Seconds | This is the number of seconds that have elapsed since January 1, 1970 (midnight UTC/GMT), not counting leap seconds (also known as Unix time or POSIX time). | 1704408001 which represents 2024-01-04 22:40:01 UTC. |
Epoch Milliseconds | Similar to epoch seconds, but in milliseconds. | 1704408001000 which represents 2024-01-04 22:40:01 UTC. |
Epoch Microseconds | This is the Unix time in microseconds. | 1704408001000000 which represents 2024-01-04 22:40:01 UTC. |
yyyy-MM-dd | This represents year, month, and day. | 2024-01-04 |
yyyy-MM-dd HH:mm:ss | This represents a date and time, using a 24-hour clock. | 2024-01-04 14:40:01 |
yyyy-MM-dd hh:mm:ss | This represents a date and time, using a 12-hour clock. | 2024-01-04 02:40:01 |
yyyy-MM-dd hh:mm:ss.SSSSS | This represents date and time with fractional seconds (up to five decimal places). | 2024-01-04 02:40:01.00000 |
yyyy-MM-dd HH:mm:ss.SSSSS | This represents date and time with fractional seconds (up to five decimal places), using a 24-hour clock. | 2024-01-04 14:40:01.00000 |
yyyy-MM-dd hh:mm:ss.SSS | This represents date and time with milliseconds, using a 12-hour clock. | 2024-01-04 02:40:01.000 |
yyyy-MM-dd HH:mm:ss.SSS | This represents date and time with milliseconds, using a 24-hour clock. | 2024-01-04 14:40:01.000 |
yyyy-MM-dd'T'HH:mm:ss.SSSXXX | This format includes timezone information with a timezone offset, using a 24-hour clock with 3 digit (milli) fraction-of-second precision including the offset hour and minute with a colon. | 2024-01-04T14:40:01.000-08:00 (Pacific Time) |
yyyy-MM-dd'T'hh:mm:ss.SSSZ | This format includes timezone information with a timezone offset, using a 12-hour clock with 3 digit (milli) fraction-of-second precision including the offset hour and minute without a colon. | 2024-01-04T02:40:01.000-0800 (Pacific Time) |
yyyy-MM-dd'T'HH:mm:ss.SSSZ | This format includes timezone information with a timezone offset, using a 24-hour clock with 3 digit (milli) fraction-of-second precision including the offset hour and minute without a colon. | 2024-01-04T14:40:01.000-0800 (Pacific Time) |
yyyy-MM-dd'T'hh:mm:ss.SSSSSSZ | This format includes timezone information with a timezone offset, using a 12-hour clock with 6 digit (micro) fraction-of-second precision including the offset hour and minute without a colon. | 2024-01-04T02:40:01.000000-0800 (Pacific Time) |
yyyy-MM-dd'T'HH:mm:ss.SSSSSSX | This format includes timezone information with a timezone offset, using a 24-hour clock with 6 digit (micro) fraction-of-second precision including the offset hour and minute with a colon. | 2024-01-04T14:40:01.000000Z (UTC) |
yyyy-MM-dd'T'HH:mm:ss.SSSSSSZ | This format includes timezone information with a timezone offset, using a 24-hour clock with 6 digit (micro) fraction-of-second precision including the offset hour and minute without a colon. | 2024-01-04T14:40:01.000000-0800 (Pacific Time) |
yyyy-MM-dd'T'hh:mm:ssZ | This format includes timezone information with a timezone offset, using a 12-hour clock at a 2 digit (centi) fraction-of-second precision including the offset hour and minute without a colon. | 2024-01-04T02:40:01.00-0800 (Pacific Time) |
yyyy-MM-dd'T'HH:mm:ssZ | This format includes timezone information with a timezone offset, using a 24-hour clock at a 2 digit (centi) fraction-of-second precision including the offset hour and minute without a colon. | 2024-01-04T14:40:01.00-0800 (Pacific Time) |
Referenced from Apache Spark SQL Date and Time Functions
Expected Output
- Adds a field of type String.
- The Source Field remains in the record.
- For Epoch formats, the output will be in the UTC timezone.
Example
// Input
// Source Field = "purchase_date"
// Source Format = "yyyy-MM-dd"
// New Field Name = "purchase_month"
// Output Format = "yyyy-MM"
{
"id": 1,
"event_name": "checkout",
"purchase_date": 2023-04-21
}
// Output
{
"id": 1,
"event_name": "checkout",
"purchase_date": 2023-04-21,
"purchase_month": "2023-04"
}