Functions (Data Flow) Reference

The following functions are used with operators in a data flow, to enable you to build expressions.

Aggregate Functions
FunctionDescriptionExample
COUNT(value[, value]*)Returns the number of rows for which one or more supplied expressions are all non-null.COUNT(expr1)
COUNT(*)Returns the total number of retrieved rows, including rows containing null.COUNT(*)
MAX(value)Returns the maximum value of the argument.MAX(expr)
MIN(value)Returns the minimum value of the argument.MIN(expr)
SUM(numeric)Returns the sum calculated from values of a group.SUM(expr1)
AVG(numeric)Returns the average of numeric values in an expression.AVG(AGGREGATE_1.src1.attribute1)
LISTAGG(column[, delimiter]) WITHIN GROUP (order_by_clause)

Concatenates values of the input column with the specified delimiter, for each group based on the order clause.

column contains the values you want to concatenate together in the result.

The delimiter separates the column values in the result. If a delimiter is not provided, then an empty character is used.

order_by_clause determines the order that the concatenated values are returned.

This function can only be used as an aggregator, and can be used with grouping or without grouping. If you use without grouping, the result is a single row. If you use with a grouping, the function returns a row for each group.

Consider a table with two columns, id, name. The table has three rows. The id column values are 101, 102, 102. The name column values are A, B, C.

+-----+--------+
| id  | name  |
+-----+--------+
| 101 | A     |
+-----+--------+
| 102 | B     |
+-----+--------+
| 102 | C     |
+-----+--------+

Example 1: Without grouping

LISTAGG(id, '-') WITHIN GROUP (ORDER BY id) returns the name column with the value A-B-C

+--------+
| name   |
+--------+
| A-B-C  |
+--------+

Example 2: Group by the id

LISTAGG(id, '-') WITHIN GROUP (ORDER BY id) returns the name column with the values A and B-C in two groups.

+--------+
| name   |
+--------+
| A      |
+--------+
| B-C    |
+--------+
Analytic Functions
Function Description Example
FIRST_VALUE(value) OVER ([ partition_clause ] order_by_clause [ windowFrame_clause ] ) Returns the value evaluated at the row that's the first row of the window frame. FIRST_VALUE(BANK_ID) OVER (PARTITION BY BANK_ID ORDER BY BANK_NAME ROWS BETWEEN CURRENT ROW AND 1 FOLLOWING) returns the first value of BANK_ID in a window over which the rows are computed as the current row and 1 row after that row, partitioned by BANK_ID and in ascending order of BANK_NAME.
LAG(value[, offset[, default]]) OVER ([ partition_clause ] order_by_clause) Returns the value evaluated at the row at a given offset before the current row within the partition. If there is no such row, the default value is returned. Both offset and default are evaluated with respect to the current row. If omitted, offset is defaulted to 1 and default to NULL. LAG(BANK_ID, 2, 'hello') OVER (PARTITION BY BANK_ID ORDER BY BANK_NAME DESC) returns the value of BANK_ID from the second row before the current row, partitioned by BANK_ID and in descending order of BANK_NAME. If there is no such value, hello is returned.
LAST_VALUE(value) OVER ([ partition_clause ] order_by_clause [ windowFrame_clause ]) Returns the value evaluated at the row that is the last row of the window frame. LAST_VALUE(BANK_ID) OVER (PARTITION BY BANK_ID ORDER BY BANK_NAME ROWS BETWEEN CURRENT ROW AND 1 FOLLOWING) returns the last value of BANK_ID in a window over which the rows are computed as the current row and 1 row after that row, partitioned by BANK_ID and in ascending order of BANK_NAME.
LEAD(value[, offset[, default]]) OVER ([ partition_clause ] order_by_clause) Returns the value evaluated at the row at a given offset after the current row within the partition. If there is no such row, the default value is returned. Both offset and default are evaluated with respect to the current row. If omitted, offset is defaulted to 1 and default to NULL. LEAD(BANK_ID, 2, 'hello') OVER (PARTITION BY BANK_ID ORDER BY BANK_NAME) returns the value of BANK_ID from the second row after the current row, partitioned by BANK_ID and in ascending order of BANK_NAME. If there is no such value, hello is returned.
RANK() OVER([ partition_clause ] order_by_clause) Returns the rank of the current row with gaps, counting from 1. RANK() OVER (PARTITION BY BANK_ID ORDER BY BANK_NAME) returns the rank of each row within the partition group of BANK_ID, in ascending order of BANK_NAME.
ROW_NUMBER() OVER([ partition_clause ] order_by_clause) Returns the unique number of the current row within its partition, counting from 1. ROW_NUMBER() OVER (PARTITION BY BANK_ID ORDER BY BANK_NAME) returns the unique row number of each row within the partition group of BANK_ID, in ascending order of BANK_NAME.
Arithmetic Functions
FunctionDescriptionExample
ABS(numeric)Returns the absolute power of the numeric value.ABS(-1)
CEIL(numeric)Returns the smallest integer not greater than the numeric valueCEIL(-1,2)
FLOOR(numeric)Returns the largest integer not greater than the numeric value.FLOOR(-1,2)
MOD(numeric1, numeric2)Returns the remainder after numeric1 is divided by numeric2.MOD(8,2)
POWER(numeric1, numeric2)Raises numeric1 to the power of numeric2.POWER(2,3)
ROUND(numeric1, numeric2)Returns numeric1 rounded to numeric2 decimal places.ROUND(2.5,0)
TRUNC(numeric1, numeric2)Returns numeric1 truncated to numeric2 decimal places.TRUNC(2.5,0)
TO_NUMBER(expr[, format, locale])Converts an expr to a number, based on the format and optional locale provided. Default locale is en-US. Supported language tags.

Supported format patterns:

  • 0: A digit
  • #: A digit, zero shows as absent
  • .: Placeholder for decimal separator
  • ,: Placeholder for grouping separator
  • E: Separates mantissa and exponent for exponential formats
  • -: Default negative prefix
  • ¤: Currency sign; replaced by currency symbol; if doubled, replaced by international currency symbol; if present in a pattern, the monetary decimal separator is used instead of the decimal separator

TO_NUMBER('5467.12') returns returns 5467.12

TO_NUMBER('-USD45,677.7', '¤¤##,###.#', 'en-US') returns -45677.7

Array Functions

Only the Expression operator supports array functions.

FunctionDescriptionExample
ARRAY_POSITION(array(...), element)Returns the position of the first occurrence of the given element in the given array.

The position is not zero based, instead it starts with 1.

ARRAY_POSITION(array(3, 2, 1, 4, 1), 1) returns 3
REVERSE(array(...)) Returns the given array of elements in a reverse order. REVERSE(array(2, 1, 4, 3)) returns [3,4,1,2]
ELEMENT_AT(array(...), index) Returns the element of the given array at the given index position.

The index is not zero based, instead it starts with 1.

If index = -1, then it returns the last element.

ELEMENT_AT(array(1, 2, 3), 2) returns 2
Conditional Functions
FunctionDescriptionExample
COALESCE(value, value [, value]*)Returns the first non-null argument, if it exists, otherwise returns null.COALESCE(NULL, 1, NULL) returns 1
NULLIF(value, value)Returns null if the two values equal each other, otherwise returns the first value.NULLIF('ABC','XYZ') returns ABC
Date and Time Functions
Function Description Example
CURRENT_DATE Returns the current date. CURRENT_DATEreturns today's date such as 2023-05-26
CURRENT_TIMESTAMP Returns the current date and time for the session time zone. CURRENT_TIMESTAMP returns today's date and current time such as 2023-05-26 12:34:56
DATE_ADD(date, number_of_days) Returns the date that's the specified number of days after the specified date. DATE_ADD('2017-07-30', 1) returns 2017-07-31
DATE_FORMAT(expr, format[, locale])

Formats an expr of Date, based on the format and optional locale provided. Default locale is en-US. Supported language tags.

Supported date format patterns:

  • yy: two digit year
  • yyyy: four digit year
  • M: Numeric month, such as 1 for January
  • MM: Numeric month, such as 01 for January
  • MMM: Abbreviated month, such as Jan
  • MMMM: Full month, such as January
  • d: Numeric day of the month, such as 1 for June 1
  • dd: Numeric day of the month, such as 01 for June 1
  • DDD: Numeric day of the year from 001 to 366, such as 002 for January 2
  • F: Numeric day of the week in a month, such as 3 for 3rd Monday in June.
  • EEE or E: Abbreviated named day of the week, such as Sun for Sunday
  • EEEE: Named day of the week, such as Sunday
  • HH: 24 hour format from 00 to 23
  • H: 24 hour format from 0 to 23
  • hh: 12 hour format from 01 to 12
  • h: 12 hour format from 1 to 12
  • mm: minutes from 00 to 59
  • ss: seconds from 00 to 59
  • SSS: milliseconds from 000 to 999
  • a: AM or PM
  • z: time zone such as PDT

DATE_FORMAT(Date '2020-10-11', 'yyyy-MM-dd') returns '2020-10-11'. The first argument is a Date object representing October 11th, 2020.

DATE_FORMAT(Date '2018-junio-17', 'yyyy/MMMM/dd', 'es-ES') returns '2018/junio/17'

DAYOFMONTH(date) Returns the date's day in the month. DAYOFMONTH('2020-12-25') returns 25
DAYOFWEEK(date) Returns the date's day in the week. DAYOFWEEK('2020-12-25') returns 6 for Friday. In the United States, Sunday is considered to be 1, Monday is 2, and so on.
DAYOFYEAR(date) Returns the date's day in the year. DAYOFYEAR('2020-12-25') returns 360
WEEKOFYEAR(date) Returns the date's week in the year.

WEEKOFYEAR('2022-07-28') returns 30

WEEKOFYEAR('2022-07-28 13:24:30') returns 30

HOUR(datetime) Returns the datetime's hour value. HOUR('2020-12-25 15:10:30') returns 15
LAST_DAY(date) Returns the date's last day of the month. LAST_DAY('2020-12-25') returns 31
MINUTE(datetime) Returns the datetime's minute value. HOUR('2020-12-25 15:10:30') returns 10
MONTH(date) Returns the date's month value. MONTH('2020-06-25') returns 6
QUARTER(date) Returns the quarter of year the date is in. QUARTER('2020-12-25') returns 4
SECOND(datetime) Returns the datetime's second value. SECOND('2020-12-25 15:10:30') returns 30
TO_DATE(string, format_string[, localeStr]) Parses the string expression with the format_string expression to a date. Locale is optional. Default is en-US. Supported language tags.

In pipeline expressions, the format_string must use the strftime format codes. Otherwise, the supported case-sensitive format strings are:

  • yy: two digit year
  • yyyy: four digit year
  • M: Numeric month, such as 1 for January
  • MM: Numeric month, such as 01 for January
  • MMM: Abbreviated month, such as Jan
  • MMMM: Full month, such as January
  • d: Numeric day of the month, such as 1 for June 1
  • dd: Numeric day of the month, such as 01 for June 1
  • DDD: Numeric day of the year from 001 to 366, such as 002 for January 2
  • F: Numeric day of the week in a month, such as 3 for 3rd Monday in June.
  • EEE or E: Abbreviated named day of the week, such as Sun for Sunday
  • EEEE: Named day of the week, such as Sunday
  • HH: 24 hour format from 00 to 23
  • H: 24 hour format from 0 to 23
  • hh: 12 hour format from 01 to 12
  • h: 12 hour format from 1 to 12
  • mm: minutes from 00 to 59
  • ss: seconds from 00 to 59
  • SSS: milliseconds from 000 to 999
  • a: AM or PM
  • z: time zone such as PDT

TO_DATE('31 December 2016', 'dd MMMM yyyy') returns a Date value of 2016-12-31

TO_DATE('2018/junio/17', 'yyyy/MMMM/dd', 'es-ES') returns a Date value of 2018-06-17

TO_TIMESTAMP(expr, format_string[, localeStr]) Converts an expr of VARCHAR to a value of TIMESTAMP, based on the format_string and the optional localeStr provided.

In pipeline expressions, the format_string must use the strftime format codes. Otherwise, the supported format patterns are:

  • yy: two digit year
  • yyyy: four digit year
  • M: Numeric month, such as 1 for January
  • MM: Numeric month, such as 01 for January
  • MMM: Abbreviated month, such as Jan
  • MMMM: Full month, such as January
  • d: Numeric day of the month, such as 1 for June 1
  • dd: Numeric day of the month, such as 01 for June 1
  • DDD: Numeric day of the year from 001 to 366, such as 002 for January 2
  • F: Numeric day of the week in a month, such as 3 for 3rd Monday in June.
  • EEE or E: Abbreviated named day of the week, such as Sun for Sunday
  • EEEE: Named day of the week, such as Sunday
  • HH: 24 hour format from 00 to 23
  • H: 24 hour format from 0 to 23
  • hh: 12 hour format from 01 to 12
  • h: 12 hour format from 1 to 12
  • mm: minutes from 00 to 59
  • ss: seconds from 00 to 59
  • SSS: milliseconds from 000 to 999
  • a: AM or PM
  • z: time zone such as PDT
TO_TIMESTAMP('2020-10-11 11:10:10', 'yyyy-MM-dd HH:mm:ss') returns a TIMESTAMP object representing 11am 10:10 Oct 11th, 2020
WEEK(date)

Returns the date's week value.

WEEK('2020-06-25') returns 4
YEAR(date) Returns the date's year value. YEAR('2020-06-25') returns 2020
ADD_MONTHS(date_expr, number_months) Returns the date after adding the specified number of months to the specified date, timestamp or string with a format such as yyyy-MM-dd or yyyy-MM-dd HH:mm:ss.SSS.

ADD_MONTHS('2017-07-30', 1) returns 2017-08-30

ADD_MONTHS('2017-07-30 09:07:21', 1) returns 2017-08-30

MONTHS_BETWEEN(start_date_expr, end_date_expr)

Returns the number of months between start_date_expr and end_date_expr. start_date_expr and end_date_expr can be a date, timestamp or string with a format such as yyyy-MM-dd or yyyy-MM-dd HH:mm:ss.SSS

A whole number is returned if both dates are the same day of the month, or both are the last day in their respective months. Otherwise, the difference is calculated based on 31 days per month.

MONTHS_BETWEEN('2022-01-01', '2022-01-31') returns 1

MONTHS_BETWEEN('2022-07-28', '2020-07-25') returns 24

MONTHS_BETWEEN('2022-07-28 13:24:30', '2020-07-25 13:24:30') returns 24

FROM_UTC_TIMESTAMP(time_stamp, time_zone)

Interprets a date, timestamp or string as a UTC time and converts that time to a timestamp in the specified time zone.

For string, use a format such as: yyyy-MM-dd or yyyy-MM-dd HH:mm:ss.SSS

Time zone format is either a region-based zone ID (for example, 'area/city' such as 'Asia/Seoul', or a time zone offset (for example, UTC+02).

FROM_UTC_TIMESTAMP('2017-07-14 02:40:00.0', 'GMT+1') returns 2017-07-14 03:40:00.0
TO_UTC_TIMESTAMP(time_stamp, time_zone)

Converts a date, timestamp or string in the specified time zone to a UTC timestamp.

For string, use a format such as: yyyy-MM-dd or yyyy-MM-dd HH:mm:ss.SSS

Time zone format is either a region-based zone ID (for example, 'area/city' such as 'Asia/Seoul'), or a time zone offset (for example, UTC+02).

TO_UTC_TIMESTAMP('2017-07-14 02:40:00.0', 'GMT+1') returns 2017-07-14 01:40:00.0
FROM_UNIXTIME(unix_time[, fmt])

Converts the specified Unix time or epoch to a string that represents the timestamp of that moment in the current system time zone and in the specified format.

Note: Unix time is the number of seconds that have elapsed since January 1, 1970 00:00:00 UTC.

If fmt is omitted, the default format is yyyy-MM-dd HH:mm:ss

FROM_UNIXTIME(1255033470) returns '2009-10-08 13:24:30'

FROM_UNIXTIME(1637258854) returns '2021-11-18 10:07:34'

Default time zone is PST in the examples

UNIX_TIMESTAMP([time_expr[, fmt]])

Converts the current or specified time to a Unix timestamp in seconds.

time_expr is a date, timestamp or string with a format such as yyyy-MM-dd or yyyy-MM-dd HH:mm:ss.SSS

If time_expr is not provided, the current time is converted.

If time_expr is a string and fmt is omitted, the default is yyyy-MM-dd HH:mm:ss

UNIX_TIMESTAMP('1970-01-01 00:00:00', 'yyyy-MM-dd HH:mm:ss') returns 28800

Default time zone is PST in this example

INTERVAL 'year' YEAR[(year_precision)]

Returns a period of time in years.

year_precision is the number of digits in the year field; it ranges from 0 to 9. If year_precision is omitted, the default is 2 (must be less than 100 years.)

INTERVAL '1' YEAR returns an interval of 1 year

INTERVAL '200' YEAR(3) returns an interval of 200 years

INTERVAL 'year month' YEAR[(year_precision)] TO MONTH

Returns a period of time in years and months. Use to store a period of time using year and month fields.

year_precision is the number of digits in the year field; it ranges from 0 to 9. If year_precision is omitted, the default is 2 (must be less than 100 years.)

INTERVAL '100-5' YEAR(3) TO MONTH returns an interval of 100 years, 5 months. Must must specify the leading year precision of 3.
INTERVAL 'month' MONTH[(month_precision)]

Returns a period of time in months.

month_precision is the number of digits in the month field; it ranges from 0 to 9. If month_precision is omitted, the default is 2 (must be less than 100 years.)

INTERVAL '200' MONTH(3) returns an interval of 200 months. Must specify the month precision of 3.
INTERVAL 'day time' DAY[(day_precision)] TO SECOND[(fractional_seconds_precision)]

Returns a period of time in terms of days, hours, minutes, and seconds.

day_precision is the number of digits in the day field; it ranges from 0 to 9. The default is 2.

fractional_seconds_precision is the number of digits in the fractional part of the second value in the time field; it ranges from 0 to 9.

INTERVAL '11 10:09:08.555' DAY TO SECOND(3) returns an interval of 11 days, 10 hours, 09 minutes, 08 seconds, and 555 thousandths of a second
INTERVAL 'day time' DAY[(day_precision)] TO MINUTE[(minute_precision)]

Returns a period of time in terms of days, hours, and minutes.

day_precision is the number of digits in the day field; it ranges from 0 to 9. The default is 2.

minute_precision is the number of digits in the minute field; it ranges from 0 to 2. The default is 2.

INTERVAL '11 10:09' DAY TO MINUTE returns an interval of 11 days, 10 hours, and 09 minutes
INTERVAL 'day time' DAY[(day_precision)] TO HOUR[(hour_precision)]

Returns a period of time in terms of days and hours.

day_precision is the number of digits in the day field; it ranges from 0 to 9. The default is 2.

hour_precision is the number of digits in the hour field; it ranges from 0 to 2. The default is 2.

INTERVAL '100 10' DAY(3) TO HOUR returns an interval of 100 days 10 hours
INTERVAL 'day' DAY[(day_precision)]

Returns a period of time in terms of days.

day_precision is the number of digits in the day field; it ranges from 0 to 9. The default is 2.

INTERVAL '999' DAY(3) returns an interval of 999 days
INTERVAL 'time' HOUR[(hour_precision)] TO SECOND[(fractional_seconds_precision)]

Returns a period of time in terms of hours, minutes, and seconds.

hour_precision is the number of digits in the hour field; it ranges from 0 to 2. The default is 2.

fractional_seconds_precision is the number of digits in the fractional part of the second value in the time field; it ranges from 0 to 9.

INTERVAL '09:08:07.6666666' HOUR TO SECOND(7) returns an interval of 9 hours, 08 minutes, and 7.6666666 seconds
INTERVAL 'time' HOUR[(hour_precision)] TO MINUTE[(minute_precision)]

Returns a period of time in terms of hours and minutes.

hour_precision is the number of digits in the hour field; it ranges from 0 to 2. The default is 2.

minute_precision is the number of digits in the minute field; it ranges from 0 to 2. The default is 2.

INTERVAL '09:30' HOUR TO MINUTE returns an interval of 9 hours and 30 minutes
INTERVAL 'hour' HOUR[(hour_precision)]

Returns a period of time in terms of hours.

hour_precision is the number of digits in the hour field; it ranges from 0 to 2. The default is 2.

INTERVAL '40' HOUR returns an interval of 40 hours
INTERVAL 'minute' MINUTE[(minute_precision)]

Returns a period of time in terms of minutes.

minute_precision is the number of digits in the minute field; it ranges from 0 to 2. The default is 2.

INTERVAL '15' MINUTE returns an interval of 15 minutes
INTERVAL 'time' MINUTE[(minute_precision)] TO SECOND[(fractional_seconds_precision)]

Returns a period of time in terms of minutes and seconds.

minute_precision is the number of digits in the minute field; it ranges from 0 to 2. The default is 2.

fractional_seconds_precision is the number of digits in the fractional part of the second value in the time field; it ranges from 0 to 9.

INTERVAL '15:30' MINUTE TO SECOND returns an interval of 15 minutes and 30 seconds
INTERVAL 'second' SECOND[(fractional_seconds_precision)]

Returns a period of time in terms of seconds.

fractional_seconds_precision is the number of digits in the fractional part of the second field; it ranges from 0 to 9. The default is 3.

INTERVAL '15.678' SECOND returns an interval of 15.678 seconds
Hash Functions
FunctionDescriptionExample
MD5(all data types)Calculates an MD5 checksum of the data type, and returns a string value. MD5(column_name)
SHA1(all data types)Calculates a SHA-1 hash value of the data type, and returns a string value. SHA1(column_name)
SHA2(all data types, bitLength)Calculates a SHA-2 hash value of the data type, and returns a string value. bitLength is an integer. SHA2 (column_name, bitLength can be set to 0 (equivalent to 256), 256, 384, or 512).
ORA_HASH(expr, [max_bucket], [seed_value])

Computes a hash value for expr, and returns a NUMBER value.

expr can be an expression, a column, a literal.

max_bucket is the maximum bucket value returned, between 0 and 4294967295 (default).

seed_value is a value between 0 (default) and 4294967295.

Oracle applies the hash function to the combination of expr and seed_value to produce many different results for the same set of data.

ORA_HASH('1')

ORA_HASH('b', 2)

ORA_HASH(100, 10, 10)

ORA_HASH(EXPRESSION_1.CUSTOMERS.SSN, 2)

Hierarchical Functions
FunctionDescriptionExample
SCHEMA_OF_JSON(string)Parses a JSON string and infers the schema in DDL format.

SCHEMA_OF_JSON('[{\"Zipcode\":704,\"ZipCodeType\":\"STANDARD\",\"City\":\"ORACLECITY\",\"State\":\"OC\"}]') returns 'ARRAY<STRUCT<City:string,State:string,ZipCodeType:string,Zipcode:bigint>>'

SCHEMA_OF_JSON('[{\"col\":0}]') returns 'ARRAY<STRUCT<col: BIGINT>>'

FROM_JSON(column, string)

Parses a column containing a JSON string into one of the following types, with the specified schema.

  • Map, with String as the key type
  • Struct
  • Array

FROM_JSON('{\"Zipcode\":704,\"City\":\"ORACLE CITY\"}', 'STRUCT<Zipcode: BIGINT, City: STRING>') returns a Struct type column with the specified schema: {704, ORACLE CITY}

FROM_JSON('{\"a\":1, \"b\":0.8}', 'STRUCT<a: BIGINT, b: DOUBLE>') returns a Struct type column with the specified schema: {1, 0.8}

TO_JSON(column)Converts a column containing a type of Struct or Array of Structs, or a Map or Array of Map, into a JSON string.TO_JSON(TO_STRUCT('s1', TO_ARRAY(1,2,3), 's2', TO_MAP('key', 'value'))) returns a JSON string {"s1":[1,2,3],"s2":{"key":"value"}}
TO_MAP(string,column[,string,column]*)Creates a new column of Map type. The input columns must be grouped as key-value pairs. The input key columns cannot be null, and must all have the same data type. The input value columns must all have the same data type.

TO_MAP('Ename',Expression_1.attribute1) returns a Map type column: {"ENAME" -> 100}

TO_MAP('block', EXPRESSION_1.MYSOURCE.address.block, 'unit', EXPRESSION_1.MYSOURCE.address.unit) returns a Map type column: {"block" -> 1,"unit" -> 1}

TO_STRUCT(string,column[,string,column]*)Creates a new column of Struct type. The input columns must be grouped as key-value pairs.

TO_STRUCT('Ename',Expression_1.attribute1) returns {100}

TO_STRUCT('Id',Expression_1.attribute1, 'Name', Expression_1.attribute2) returns {100, "John"}

TO_ARRAY(column[,column]*)Creates a new column as Array type. The input columns must all have the same data type.

TO_Array(Expression_1.attribute1) returns [100]

TO_ARRAY(EXPRESSION_1.attribute2,EXPRESSION_1.attribute3) returns ["John","Friend"]

Higher-Order Functions

Data flow operators that support creating expressions and hierarchical data types can use higher-order functions.

The supported operators are:

  • Aggregate

  • Expression

  • Filter

  • Join

  • Lookup

  • Split

  • Pivot

FunctionDescriptionExample
TRANSFORM(column, lambda_function)Takes an array and an anonymous function, and sets up a new array by applying the function to each element, and assigning the result to the output array.For an input array of integers [1, 2, 3], TRANSFORM(array, x -> x + 1) returns a new array of [2, 3, 4].
TRANSFORM_KEYS(column, lambda_function)Takes a map and a function with 2 arguments (key and value), and returns a map in which the keys have the type of the result of the lambda function, and the values have the type of the column map values.For an input map with integer keys and string values of {1 -> 'value1', 2 -> 'value2', 3 -> 'value3'}, TRANSFORM_KEYS(map, (k, v) -> k * 2 + 1) returns a new map of {3 -> 'value1', 5 -> 'value2', 7 -> 'value3'}.
TRANSFORM_VALUES(column, lambda_function)Takes a map and a function with 2 arguments (key and value), and returns a map in which the values have the type of the result of the lambda functions, and the keys have the type of the column map keys. For an input map with string keys and string values of {'a' -> 'value1', 'b' -> 'value2', 'c' -> 'value3'}, TRANSFORM_VALUES(map, (k, v) -> k || __ '' || v) returns a new map of {'a' -> 'a_value1', 'b' -> 'b_value2', 'c' -> 'c_value3'}.
ARRAY_SORT(array(...), lambda_function)

Only the Expression operator supports ARRAY_SORT.

Takes an array and sorts according to the given function that takes 2 arguments.

The function must return -1, 0, or 1 depending on whether the first element is less than, equal to, or greater than the second element.

If the function is omitted, the array is sorted in ascending order.

array_sort(to_array(5, 6, 1),
                   (left, right) -> CASE WHEN left < right THEN -1
                                         WHEN left > right THEN 1 ELSE 0 END)

The returned array is:

[1,5,6]
Operator (Comparison) Functions
FunctionDescriptionExample
CASE WHEN condition1 THEN result1 ELSE result2 ENDReturns the value for which a condition is met.CASE WHEN 1 > 0 THEN 'ABC' ELSE 'XYZ' END returns ABC if 1> 0, otherwise returns XYZ
ANDThe logical AND operator. Returns true if both operands are true, otherwise returns false.(x = 10 AND y = 20) returns "true" if x is equal to 10 and y is equal to 20. If either one is not true, then it returns "false"
ORThe logical OR operator. Returns true if either operand is true or both are true, otherwise returns false.(x = 10 OR y = 20) returns "false" if x is not equal to 10 and also y is not equal to 20. If either one is true, then it returns "true"
NOTThe logical NOT operator.
LIKEPerforms string pattern matching, whether string1 matches the pattern in string2.
=Tests for equality. Returns true if expr1 equals expr2, otherwise returns false.x = 10 returns "true" when value of x is 10, else it returns "false"
!=Tests for inequality. Returns true if expr1 does not equal to expr2, otherwise returns false.x != 10 returns "false" if value of x is 10, else it returns "true"
>Tests for an expression greater than. Returns true if expr1 is greater than expr2.x > 10 returns "true" if value of x is greater than 10, else it returns "false"
>=Tests for an expression greater than or equal to. Returns true if expr1 is greater than or equal to expr2.x > =10 returns "true" if value of x is greater than or equal to 10, else it returns "false"
<Tests for an expression less than. Returns true if expr1 is less than expr2.x < 10 returns "true" if value of x is less than 10, else it returns "false"
<=Tests for an expression less than or equal to. Returns true if expr1 is less than or equal to expr2.x <= 10 returns "true" if value of x is less than 10, else it returns "false"
||Concatenates two strings.'XYZ' || 'hello' returns 'XYZhello'
BETWEENEvaluates a range.FILTER_1.BANK.BANK_ID BETWEEN 1003 AND 1007
INTests whether an expression matches a list of values.FILTER_2.ORDERS.ORDER_ID IN (1003, 1007)
String Functions
FunctionsDescriptionExample
CAST(value AS type)Returns the specified value in the specified type.CAST("10" AS INT) returns 10
CONCAT(string, string)Returns the combined values of strings or columns.CONCAT('Oracle','SQL') returns OracleSQL
CONCAT_WS(separator, expression1, expression2, expression3,...) Returns the combined values of strings or columns using the specified separator in between the strings or columns.

A separator is required and it must be a string.

At least one expression must be provided after the separator. For example: CONCAT_WS(',' col1)

CONCAT_WS('-', 'Hello', 'Oracle') returns Hello-Oracle

CONCAT_WS(' ', address, city, postal_code) returns 123 MyCity 987654

If a child of the function is an array, then the array is flattened:

CONCAT_WS(',', 1,2,3, to_array(4,5,6), to_array(7,8), 9) returns 1,2,3,4,5,6,7,8,9

INITCAP(string)Returns the string with the first letter in each word capitalized, while all other letters are in lowercase, and each word is delimited by a white space.INITCAP('oRACLE sql') returns Oracle Sql
INSTR(string, substring[start_position])Returns the (1-based) index of the first occurrence of substring in string.INSTR('OracleSQL', 'SQL') returns 7
LOWER(string)Returns the string with all letters changed to lowercase.LOWER('ORACLE') returns oracle
LENGTH(string)Returns the character length of string or the number of bytes of binary data. The length of the string includes trailing spaces.LENGTH('Oracle') returns 6
LTRIM(string)Returns the string with leading spaces removed from the left.LTRIM(' Oracle')
NVL(expr1, epxr2)Returns the argument that is not null.NVL(EXPRESSION_3.CUSTOMERS_JSON.CONTINENT_ID, ROWID())
REGEXP_SUBSTR(string, regexp[, RegexGroupIdx])Searches and extracts the string that matches a regular expression pattern from the input string. If the optional capturing group index is provided, the function extracts the specific group.

REGEXP_SUBSTR('https://www.oracle.com/products', 'https://([[:alnum:]]+\.?){3,4}/?') returns https://www.oracle.com

REGEXP_SUBSTR('22;33;44', '([0-9.]*);([0-9.]*);([0-9.]*)', 1) returns 22
REPLACE(string, search, replacement)Replaces all occurrences of search with replacement.

If search is not found in string, then string is returned unchanged.

If replacement is not specified or is an empty string, nothing replaces the search that is removed from string.

REPLACE('ABCabc', 'abc', 'DEF') returns ABCDEF
RTRIM(string)Returns the string with leading spaces removed from the right.RTRIM('Oracle ')
SUBSTRING(string, position[, substring_length])Returns the substring starting at position. SUBSTRING('Oracle SQL' FROM 2 FOR 3) returns rac
For numbers, TO_CHAR(expr) and for dates TO_CHAR(expr, format[, locale])Converts numbers and dates into strings. For numbers, no format is required. For dates, use the same format as DATE_FORMAT described in Date and Time Functions. Default locale is en-US. See supported language tags.

In pipeline expressions, the format_string must use the strftime format codes. Otherwise, the supported date format patterns are:

  • yy: two digit year
  • yyyy: four digit year
  • M: Numeric month, such as 1 for January
  • MM: Numeric month, such as 01 for January
  • MMM: Abbreviated month, such as Jan
  • MMMM: Full month, such as January
  • d: Numeric day of the month, such as 1 for June 1
  • dd: Numeric day of the month, such as 01 for June 1
  • DDD: Numeric day of the year from 001 to 366, such as 002 for January 2
  • F: Numeric day of the week in a month, such as 3 for 3rd Monday in June.
  • EEE or E: Abbreviated named day of the week, such as Sun for Sunday
  • EEEE: Named day of the week, such as Sunday
  • HH: 24 hour format from 00 to 23
  • H: 24 hour format from 0 to 23
  • hh: 12 hour format from 01 to 12
  • h: 12 hour format from 1 to 12
  • mm: minutes from 00 to 59
  • ss: seconds from 00 to 59
  • SSS: milliseconds from 000 to 999
  • a: AM or PM
  • z: time zone such as PDT

Number example: TO_CHAR(123) returns 123

Date example: TO_CHAR(Date '2020-10-30', 'yyyy.MM.dd', 'en-US') returns the string 2020.10.30. The first argument is a Date object representing Oct 30th, 2020.

UPPER(string)Returns a string with all letters changed to uppercase.UPPER('oracle') returns ORACLE
LPAD(str, len[, pad])Returns a string that is left-padded with specified characters to a certain length. If the pad character is omitted, the default is a space.LPAD('ABC', 5, '*') returns '**ABC'
RPAD(str, len[, pad])Returns a string that is right-padded with specified characters to a certain length. If the pad character is omitted, the default is a space.RPAD('XYZ', 6, '+' ) returns 'XYZ+++'
Unique ID Functions
FunctionDescriptionExample
NUMERIC_ID()Generates a universally unique identifier that is a 64-bit number for each row.NUMERIC_ID() returns for example, 3458761969522180096 and 3458762008176885761
ROWID()Generates monotonically increasing 64-bit numbers.ROWID() returns for example, 0, 1, 2, and so on
UUID()Generates a universally unique identifier that is a 128-bit String for each row.UUID() returns for example, 20d45c2f-0d56-4356-8910-162f4f40fb6d
MONOTONICALLY_INCREASING_ID() Generates unique, monotonically increasing 64-bit integers that are not consecutive numbers. MONOTONICALLY_INCREASING_ID() returns for example, 8589934592 and 25769803776