Using Data Flow Operators
In Data Integration, data flow operators represent input sources, output targets, and transformations that can be used in a data flow.
From the Operators panel, drag operators onto the canvas to design a data flow. Then use the Details tab on the Properties panel to configure basic and required properties for each operator. Where applicable, use the Advanced options tab to specify other properties.
In general, a data flow operator can have one or more inbound ports, and one or more outbound ports for data to flow through. For example, you can connect the same source outbound port to inbound ports on a filter, a join, and a target operator. You can also connect another source operator to the same join inbound port.
A data flow must include at least one source operator and one target operator to be valid. While Data Integration supports multiple target operators in a data flow, a target operator can have only one inbound port.
To connect operators, hover over an operator until you see the connector (small circle) on the right side of the operator. Then drag the connector to the next operator you want to connect to. A connection is valid when a line connects the operators after you drop the connector.
A connection line symbolizes how data flows from one node to the other. While you can drag a visible connector from one object to another, you can't have more than one inbound connection line to a filter, expression, aggregate, distinct, sort, and target operator.
For complex data types, see Hierarchical Data Types to understand what is supported. You might not be able to perform some configurations on the Attributes tab and Data tab in the Properties panel.
Data Operators
Data Integration provides the source operator and target operator for adding input and output data entities to serve as the input to data flows, and the output for transformed data.
To configure the input or output data entity for a data operator in a data flow, you begin by selecting a data asset, connection, and schema (or bucket).
You make the selections only in the order as displayed on the Details tab of the Properties panel, by clicking Select when the option is enabled next to the resource type.
For example, when you first add a source operator, only Data asset is enabled for selection. The select option for the subsequent resource type, Connection, is enabled only after you make a selection for the preceding object.
When you click Select next to a resource, a panel appears for you to select the object you want. You can use the resource menu to select the object, or click View all to use another panel to browse or search, and then select the object.
Upon selecting a resource object, each subsequent selection is based on the parent-child relation inherited from the preceding selection. A breadcrumb at the top of the selection panel shows the selection hierarchy. For example, let's assume that you have selected the data asset "Oracle Database data asset 1," and the connection "Default connection." When you select the Schema, the breadcrumb displays as "From Oracle Database data asset1 using connection Default connection."
After selecting a data asset, connection, and schema (or bucket), you select a data entity from the list of available data entities.
In general, you can do the following in the Browse data entities panel:
-
Browse the available data entities and select an entity by its name.
-
Filter the available list to search and then select. In the search field, enter a partial or full entity name and press Enter to start the search. Search is case-sensitive. For example, if the available data entities include
BANK_US
andBANK_EU
, enterBANK
and then select from the filtered list. -
Use one or more parameters in the search string. For example:
CUSTOMERS_${COUNTRY}
To use a parameterized data entity name to select the input or output resource, see Using Parameters in Data Entity Names.
Depending on the data asset type of a resource, after selecting a data entity, more configuration might be necessary on the Details tab or Advanced options tab in the Properties panel.
To keep a resource object such as a data asset or connection in a data flow from being permanently bound to a specific resource, you assign a parameter to that object.
You can assign the parameter after or before you select the resource object.
-
In the data flow, add the source or target operator. After making a selection for a resource type, click Assign parameter next to the resource name to use another panel to select and assign a parameter for the selected object. If a suitable parameter type is not available, you can add a parameter and then assign it.
Note
Assign parameter is not available for a data entity whose name includes a parameter (such asBANK_${REGION}
). You cannot bind a parameterized entity resource to another parameter. -
Alternatively, add the source or target operator, then click Assign parameter to use one panel to assign a parameter, and select the resource for the object at the same time. In the Add parameter panel, you select a resource based on the parent-child relation inherited from the preceding selection. The breadcrumb at the top of the panel shows the selection hierarchy.
See also Using Data Flow Parameters.
You can include one or more data flow parameters in the name of the data entity resource you specify for a data operator.
The syntax to use for a data flow parameter in a data entity name is
${PARAMETER_NAME}
. For example:
CUSTOMERS_${COUNTRY}
A parameter name is case-sensitive, and each parameter must have a default value.
For example, CUSTOMERS_${COUNTRY}
might return the database table
CUSTOMERS_USA
, and BANK_${COUNTRY}/*
might return
the Object Storage files in
BANK_EU
.
To use parameters in data entity names when you configure a data operator, you can:
- Add the parameter to the data flow before you add the data operator
- Add the parameter at the time you're configuring the operator's data entity
How to add a parameter
In a data flow, select Parameters from the View menu on the canvas toolbar to open the Parameters panel.
In the Parameters panel, click Config and then click Add.
In the Add parameter panel, configure a parameter of the appropriate data type, for example VARCHAR or NUMERIC, and add a default value.
During data entity configuration in a data flow, you can search available data entities by entering the parameter name in the Browse data entities panel. In the search field, start typing ${
followed by any character. If the front part of the parameter name matches existing parameters in the data flow, a list of suggested names displays. To select a parameter from the list, click the parameter name, complete the syntax by adding }
, then press Enter.
How to add a parameter at the time of data entity configuration
In the Browse data entities panel, you can do the following:
-
From the More actions menu, select Add data flow parameter to use the Add data flow parameter panel. Specify the data type, default value, and other properties for the parameter you want to add and use.
-
In the search field, start typing
${
followed by any character. If the front part of the parameter name matches existing parameters in the data flow, a list of suggested names displays. To select a parameter from the list, click the parameter name, complete the syntax by adding}
, then press Enter. -
In the search field, enter the parameter name for example
${PARAMETER_NAME}
). If the parameter name does not yet exist in the data flow, and you press Enter, Data Integration displays the Add data flow parameter panel. Alternatively, after entering the parameter name, select Add data flow parameter from the More actions menu.In the Add data flow parameter panel, specify the data type, default value, and other properties for the parameter you want to add and use.
Source Operator
Use the source operator to specify the data entities that serve as the input to the data flow.
You can add multiple source operators on a data flow.
If you're using a hierarchical data entity for a source operator, see Hierarchical Data Types to understand what is supported. You might not be able to perform some configurations on the Attributes tab or Data tab in the Properties panel.
When configuring OCI Object Storage as a data source, you can use a regular expression to specify a file pattern for selecting one or more data entities.
A file pattern is a rule for finding files that match a directory and file name and how to handle the matched files when found.
Syntax to use
Data Integration supports the glob pattern syntax for specifying a file pattern.
- An asterisk,
*
, matches any number of characters (including none). - Two asterisks,
**
, works the same as*
but crosses directory boundaries to match complete paths. - A question mark,
?
, matches exactly one character. - Braces specify a collection of sub patterns. For example:
{sun,moon,stars}
matches "sun", "moon", or "stars".{temp*,tmp*}
matches all strings beginning with "temp" or "tmp".
- Square brackets convey a set of single characters or, when the hyphen character (
-
) is used, a range of characters. For example:[aeiou]
matches any lowercase vowel.[0-9]
matches any digit.[A-Z]
matches any uppercase letter.[a-z,A-Z]
matches any uppercase or lowercase letter.
Within the square brackets,
*
,?
, and\
match themselves. - All other characters match themselves.
- To match
*
,?
, or the other special characters, you can escape them by using the backslash character,\
. For example:\\
matches a single backslash, and\?
matches the question mark.
Examples
*.html | Matches all strings that end in .html |
??? | Matches all strings with exactly three letters or digits |
*[0-9]* | Matches all strings containing a numeric value |
*.{htm,html,pdf} | Matches any string ending with .htm , .html , or .pdf |
a?*.java | Matches any string beginning with a , followed by at least one letter or digit, and ending with .java |
{foo*,*[0-9]*} | Matches any string beginning with foo or any string containing a numeric value |
directory1/20200209/part-*[0-9]*json | Matches all files in the folder where the file name starts with part- and have any number of 0-9 numbers and ends with json |
directory3/**.csv | Matches all files with an extension of csv that are in the folder directory3 and its subfolders |
directory3/*.csv | Matches all files with an extension of csv that are only in the main folder directory3 . Files in subfolders are not included. |
You can test the expression to ensure that the pattern you want to use retrieves the Object Storage files for one or more data entities.
-
In the Select data entity panel, click Browse by pattern.
-
In the Browse data entities by pattern panel, select Test pattern from the More actions menu.
-
In the Test pattern panel, in the Search pattern field, enter a pattern expression you want to test before using.
For example, enter
department1/2020/*.json
to find all files with the extension.json
that are in the directorydepartment1/2020
. You can use parameter syntax${}
in the Search pattern field. -
To test the search pattern, in the Test filenames block, provide a file name or multiple file names delimited by a new line. For example, for the pattern
BANK_C*/*
, the file names might be:BANK_CITY/part-00002-0aaf87d57fbc-c000.csv BANK_COUNTRY/part-00000-a66df3ab02fd-c000.csv
-
Click Test pattern.
Verify that the test file names are returned in the Resulting file name block.
-
Click Use pattern to add the pattern expression to the Browse data entities by pattern panel.
You're returned to the Browse data entities by pattern panel. The files that match the pattern expression are displayed in the table.
-
Click Select pattern.
You're returned to the Select data entity panel. The pattern expression is displayed next to Data entity.
When you use a pattern expression, all existing files that match the pattern are assumed to have the same structure. The matched files are treated as a single entity in the data flow. Any future new files that match the pattern are also processed.
Incremental load is loading only new or updated data from a source into a target. In Data Integration, when you configure BICC Oracle Fusion Applications as a source data, you can use the Managed incremental extract strategy to do incremental load.
When you choose to use the incremental extract strategy, only new or updated records from the source are extracted based on a last extract date. Data Integration provides two last extract date options:
-
Custom: You provide a last extract date for every task run.
-
Managed: Data Integration manages the date for you by tracking the task run timestamp and storing the last successful load date as a watermark across consecutive runs.
With the Data Integration managed last extract date option, you don't need to explicitly specify a date for a task run. You can, however, override the date at runtime.
Task runs initiated by a task schedule
Data Integration tracks a task run independently of a task run that's initiated by a task schedule. So if you use the Managed last extract date option and also set up a task schedule, Data Integration automatically keeps track of the last successful task run timestamp of task-scheduled runs separately from the last successful task run timestamp of task runs that are not initiated by a task schedule. This means the last date that's managed within a task schedule or the last date that's managed within a task is not modified by the other run operation.
Incremental load for different data entities in a data flow
Suppose you want to set up incremental load on different data entities in a data flow. You can achieve this by using parameters and creating a task schedule for each data entity. The general steps are:
- In the data flow, assign parameters for the source schema (BICC offering) and data entity (BICC VO).
- Create and publish an integration task for the parameterized data flow.
- Create a task schedule for the integration task. On the Configure parameters page, specify the schema and data entity values for the task schedule.
- Create another task schedule for the same integration task. On the Configure parameters page, set the schema and data entity values for this task schedule.
Incremental load multiple times in one day
To perform an incremental load more than one time in a day, add a filter operator immediately after the BICC source operator in the data flow. Then create a condition expression to filter out data that has already been processed. For example, if the BICC VO's last_update_date column is LastUpdateDateField
, then the expression might be the following:
FILTER_1.MYSOURCE_1.LastUpdateDateField > ${SYS.LAST_LOAD_DATE}
Target Operator
Use the target operator to specify the data entities that serve as the output for storing transformed data.
You can add multiple target operators to a data flow. Each target can have only one inbound port.
If you're using a hierarchical data entity for a target operator, see Hierarchical Data Types to understand what is supported. You might not be able to perform some configurations on the Attributes, Map, and Data tabs in the Properties panel.
The Data tab displays the transformed data based on the operators applied in the data flow.
If you're using a hierarchical data entity for a target operator, see Hierarchical Data Types to understand what is supported.
You can filter data in the target entity by a name pattern or data type. To filter data by a name pattern, enter a simple regex pattern or wildcards ? and * in the Filter by Pattern field. To filter data by a type, select the data type from the menu next to the pattern field.
Transformations cannot be applied to the target operator as the data is read-only.
The Map tab is only available for a target operator.
If you're using a hierarchical data entity for a target operator, see Hierarchical Data Types to understand what is supported.
When you're creating a new target data entity, the Map tab is unavailable. The incoming attributes are used to create the table or file structure with a 1-to-1 mapping.
When using an existing target data entity, map the incoming attributes to the target data entity's attributes. The actions you can perform are:
Maps incoming attributes to target entity attributes according to their position in the lists.
From the Actions menu, select Auto-map by position. The Auto-map by position rule is added.
Maps incoming attributes to target attributes with the same name.
From the Actions menu, select Auto-map by name. The Auto-map by name rule is added.
Maps incoming attributes to target attributes based on simple, user-defined regex rules.
From the Actions menu, select Map by pattern. Enter a source pattern and a target pattern. Then click Preview mapping to test the source and target patterns.
To define a pattern, you can use asterisk (*) and question mark (?) symbols. Use an asterisk to indicate a wildcard of any number of characters in a string pattern. Use a question mark to indicate a single character wildcard. For example, *INPUT?
maps any matching attribute starting with n number of characters containing the string INPUT
followed by a single character, such as NEWINPUTS
.
By default, pattern matching is case-insensitive. For example, the source pattern *Name
matches the target name CUSTOMER_NAME
and Customer_Name
.
To indicate different capture groups, use $n
. For example, let's say you want to map LAST_NAME
, FIRST_NAME
, and USERNAME
from a source or upstream operator to TGT_LAST_NAME
, TGT_FIRST_NAME
, and TGT_USERNAME
in the target data entity. You would enter *NAME
in the Source pattern field and TGT_$1
in the Target pattern field. The asterisk (*) in *NAME
means the character string before NAME
is to be identical to the character string found in $1
of the target pattern. $1
refers to the first capture group in the source pattern, which in this case is the asterisk (*).
In cases where you need case-sensitive pattern matching, add the (?c)
prefix to the source pattern. For example, let's say you want to map the source attribute CustomerName
, which uses camel case lettering in its name, to the target attribute with the name Customer_Name
. You would enter (?c)([A-Z][a-z]+)([A-Z][a-z]+)
as the source pattern and $1_$2
as the target pattern. When the (?c)
prefix is added to the beginning of a source pattern, case-sensitive pattern matching is switched on for the mapping. Data Integration detects that the "N" in CustomerName
is the start of a new pattern and thus treats ([A-Z][a-z]+)([A-Z][a-z]+)
as two different words (capture groups) when matching.
Drag an incoming attribute from the source list to an attribute in the target list to create a mapping.
Alternatively, you can select Manual map from the Actions menu. Then use the Map Attribute dialog to create a mapping by selecting a source attribute and a target attribute.
Removes the selected mapping.
Select View rules. In the Rules panel, select one or more rules and click Remove. Alternatively, you can select Remove from the rule's Actions menu () to clear that rule.
Removes all mappings.
From the Actions menu, select Reset mappings. All manual and auto-map rules are removed.
Shaping Operators
For complex data types, see Hierarchical Data Types to understand what is supported. You might not be able to perform some configurations on the Attributes tab and Data tab in the Properties panel.
Filter Operator
Use the filter operator to select a subset of data from the inbound port to continue downstream to the outbound port based on a filter condition.
Use the Condition Builder to visually select elements to create a filter condition. You can also enter a filter condition manually in the editor.
Creating a filter condition lets you select a subset of data from an upstream operator based on the condition.
The elements that you can use in a filter condition include incoming attributes, parameters, and functions. You can double-click or drag an element from the list to add to the editor to build a condition. You can validate the condition before creating it.
Incoming displays the attributes from the upstream operator that are coming into this filter operator.
For example, to filter data by a city name, you can create the condition expression as:
FILTER_1.ADDRESSES.CITY='Redwood Shores'
Parameters are the expression parameters that have been added to the data flow using the Condition Builder (filter, join, lookup, and split operators) or Expression Builder (expression and aggregate operators). An expression parameter has a name, type, and default value. See Adding an Expression Parameter.
Suppose you want to use a parameter for the city name in the filter condition. You can create a VARCHAR
parameter with the name P_VARCHAR_CITY
, and set the default value to Redwood Shores
. Then you can create the filter expression as:
FILTER_1.ADDRESSES.CITY=$P_VARCHAR_CITY
Functions are the functions available in Data Integration that you can use in a condition. Functions are operations performed on arguments passed to the function. Functions calculate, manipulate, or extract data values from arguments.
For example, to filter data by a city name or by population, you can use the OR
function to create the filter condition expression as:
FILTER_1.COUNTRIES.CITY=$P_VARCHAR_CITY OR FILTER_1.COUNTRIES.POPULATION>100000000
FILTER_1.COUNTRIES.CITY=$P_VARCHAR_CITY OR FILTER_1.COUNTRIES.POPULATION>$P_NUMERIC
Here's a list of functions that are available for you to add when you construct conditions:
Function | Description | Example |
---|---|---|
MD5(all data types) | Calculates an MD5 checksum of the data type, and returns a string value. | MD5(column_name) |
SHA1(all data types) | Calculates a SHA-1 hash value of the data type, and returns a string value. | SHA1(column_name) |
SHA2(all data types, bitLength) | Calculates a SHA-2 hash value of the data type, and returns a string value. bitLength is an integer. | SHA2 (column_name, bitLength can be set to 0 (equivalent to 256), 256, 384, or 512) . |
ORA_HASH(expr, [max_bucket], [seed_value]) |
Computes a hash value for
Oracle applies the hash function to the combination of |
|
Function | Description | Example |
---|---|---|
ABS(numeric) | Returns the absolute power of the numeric value. | ABS(-1) |
CEIL(numeric) | Returns the smallest integer not greater than the numeric value | CEIL(-1,2) |
FLOOR(numeric) | Returns the largest integer not greater than the numeric value. | FLOOR(-1,2) |
MOD(numeric1, numeric2) | Returns the remainder after numeric1 is divided by numeric2 . | MOD(8,2) |
POWER(numeric1, numeric2) | Raises numeric1 to the power of numeric2 . | POWER(2,3) |
ROUND(numeric1, numeric2) | Returns numeric1 rounded to numeric2 decimal places. | ROUND(2.5,0) |
TRUNC(numeric1, numeric2) | Returns numeric1 truncated to numeric2 decimal places. | TRUNC(2.5,0) |
TO_NUMBER(expr[, format, locale]) | Converts an expr to a number, based on the format and optional locale provided. Default locale is en-US . Supported language tags.Supported format patterns:
|
|
Function | Description | Example |
---|---|---|
CURRENT_DATE |
Returns the current date. | CURRENT_DATE returns today's date such as 2023-05-26 |
CURRENT_TIMESTAMP |
Returns the current date and time for the session time zone. | CURRENT_TIMESTAMP returns today's date and current time such as 2023-05-26 12:34:56
|
DATE_ADD(date, number_of_days) |
Returns the date that's the specified number of days after the specified date . |
DATE_ADD('2017-07-30', 1) returns 2017-07-31 |
DATE_FORMAT(expr, format[, locale]) |
Formats an Supported date format patterns:
|
|
DAYOFMONTH(date) |
Returns the date's day in the month. | DAYOFMONTH('2020-12-25') returns 25 |
DAYOFWEEK(date) |
Returns the date's day in the week. | DAYOFWEEK('2020-12-25') returns 6 for Friday. In the United States, Sunday is considered to be 1, Monday is 2, and so on. |
DAYOFYEAR(date) |
Returns the date's day in the year. | DAYOFYEAR('2020-12-25') returns 360 |
WEEKOFYEAR(date) |
Returns the date's week in the year. |
|
HOUR(datetime) |
Returns the datetime's hour value. | HOUR('2020-12-25 15:10:30') returns 15 |
LAST_DAY(date) |
Returns the date's last day of the month. | LAST_DAY('2020-12-25') returns 31 |
MINUTE(datetime) |
Returns the datetime's minute value. | HOUR('2020-12-25 15:10:30') returns 10 |
MONTH(date) |
Returns the date's month value. | MONTH('2020-06-25') returns 6 |
QUARTER(date) |
Returns the quarter of year the date is in. | QUARTER('2020-12-25') returns 4 |
SECOND(datetime) |
Returns the datetime's second value. | SECOND('2020-12-25 15:10:30') returns 30 |
TO_DATE(string, format_string[, localeStr]) |
Parses the string expression with the format_string expression to a date. Locale is optional. Default is en-US . Supported language tags.In pipeline expressions, the
|
|
TO_TIMESTAMP(expr, format_string[, localeStr]) |
Converts an expr of VARCHAR to a value of TIMESTAMP, based on the format_string and the optional localeStr provided.In pipeline expressions, the
|
TO_TIMESTAMP('2020-10-11 11:10:10', 'yyyy-MM-dd HH:mm:ss') returns a TIMESTAMP object representing 11am 10:10 Oct 11th, 2020 |
WEEK(date) |
Returns the date's week value. |
WEEK('2020-06-25') returns 4 |
YEAR(date) |
Returns the date's year value. | YEAR('2020-06-25') returns 2020 |
ADD_MONTHS(date_expr, number_months) |
Returns the date after adding the specified number of months to the specified date, timestamp or string with a format such as yyyy-MM-dd or yyyy-MM-dd HH:mm:ss.SSS . |
|
MONTHS_BETWEEN(start_date_expr, end_date_expr) |
Returns the number of months between A whole number is returned if both dates are the same day of the month, or both are the last day in their respective months. Otherwise, the difference is calculated based on 31 days per month. |
|
FROM_UTC_TIMESTAMP(time_stamp, time_zone) |
Interprets a date, timestamp or string as a UTC time and converts that time to a timestamp in the specified time zone. For string, use a format such as: Time zone format is either a region-based zone ID (for example, 'area/city' such as 'Asia/Seoul', or a time zone offset (for example, UTC+02). |
FROM_UTC_TIMESTAMP('2017-07-14 02:40:00.0', 'GMT+1') returns 2017-07-14 03:40:00.0 |
TO_UTC_TIMESTAMP(time_stamp, time_zone) |
Converts a date, timestamp or string in the specified time zone to a UTC timestamp. For string, use a format such as: Time zone format is either a region-based zone ID (for example, 'area/city' such as 'Asia/Seoul'), or a time zone offset (for example, UTC+02). |
TO_UTC_TIMESTAMP('2017-07-14 02:40:00.0', 'GMT+1') returns 2017-07-14 01:40:00.0
|
FROM_UNIXTIME(unix_time[, fmt]) |
Converts the specified Unix time or epoch to a string that represents the timestamp of that moment in the current system time zone and in the specified format. Note: Unix time is the number of seconds that have elapsed since January 1, 1970 00:00:00 UTC. If |
Default time zone is PST in the examples |
UNIX_TIMESTAMP([time_expr[, fmt]]) |
Converts the current or specified time to a Unix timestamp in seconds.
If If |
Default time zone is PST in this example |
INTERVAL 'year' YEAR[(year_precision)] |
Returns a period of time in years. year_precision is the number of digits in the year field; it ranges from 0 to 9. If year_precision is omitted, the default is 2 (must be less than 100 years.) |
|
INTERVAL 'year month' YEAR[(year_precision)] TO MONTH |
Returns a period of time in years and months. Use to store a period of time using year and month fields. year_precision is the number of digits in the year field; it ranges from 0 to 9. If year_precision is omitted, the default is 2 (must be less than 100 years.) |
INTERVAL '100-5' YEAR(3) TO MONTH returns an interval of 100 years, 5 months. Must must specify the leading year precision of 3. |
INTERVAL 'month' MONTH[(month_precision)] |
Returns a period of time in months. month_precision is the number of digits in the month field; it ranges from 0 to 9. If month_precision is omitted, the default is 2 (must be less than 100 years.) |
INTERVAL '200' MONTH(3) returns an interval of 200 months. Must specify the month precision of 3. |
INTERVAL 'day time' DAY[(day_precision)] TO SECOND[(fractional_seconds_precision)] |
Returns a period of time in terms of days, hours, minutes, and seconds. day_precision is the number of digits in the day field; it ranges from 0 to 9. The default is 2. fractional_seconds_precision is the number of digits in the fractional part of the second value in the time field; it ranges from 0 to 9. |
INTERVAL '11 10:09:08.555' DAY TO SECOND(3) returns an interval of 11 days, 10 hours, 09 minutes, 08 seconds, and 555 thousandths of a second |
INTERVAL 'day time' DAY[(day_precision)] TO MINUTE[(minute_precision)] |
Returns a period of time in terms of days, hours, and minutes. day_precision is the number of digits in the day field; it ranges from 0 to 9. The default is 2. minute_precision is the number of digits in the minute field; it ranges from 0 to 2. The default is 2. |
INTERVAL '11 10:09' DAY TO MINUTE returns an interval of 11 days, 10 hours, and 09 minutes |
INTERVAL 'day time' DAY[(day_precision)] TO HOUR[(hour_precision)] |
Returns a period of time in terms of days and hours. day_precision is the number of digits in the day field; it ranges from 0 to 9. The default is 2. hour_precision is the number of digits in the hour field; it ranges from 0 to 2. The default is 2. |
INTERVAL '100 10' DAY(3) TO HOUR returns an interval of 100 days 10 hours |
INTERVAL 'day' DAY[(day_precision)] |
Returns a period of time in terms of days. day_precision is the number of digits in the day field; it ranges from 0 to 9. The default is 2. |
INTERVAL '999' DAY(3) returns an interval of 999 days |
INTERVAL 'time' HOUR[(hour_precision)] TO SECOND[(fractional_seconds_precision)] |
Returns a period of time in terms of hours, minutes, and seconds. hour_precision is the number of digits in the hour field; it ranges from 0 to 2. The default is 2. fractional_seconds_precision is the number of digits in the fractional part of the second value in the time field; it ranges from 0 to 9. |
INTERVAL '09:08:07.6666666' HOUR TO SECOND(7) returns an interval of 9 hours, 08 minutes, and 7.6666666 seconds |
INTERVAL 'time' HOUR[(hour_precision)] TO MINUTE[(minute_precision)] |
Returns a period of time in terms of hours and minutes. hour_precision is the number of digits in the hour field; it ranges from 0 to 2. The default is 2. minute_precision is the number of digits in the minute field; it ranges from 0 to 2. The default is 2. |
INTERVAL '09:30' HOUR TO MINUTE returns an interval of 9 hours and 30 minutes |
INTERVAL 'hour' HOUR[(hour_precision)] |
Returns a period of time in terms of hours. hour_precision is the number of digits in the hour field; it ranges from 0 to 2. The default is 2. |
INTERVAL '40' HOUR returns an interval of 40 hours |
INTERVAL 'minute' MINUTE[(minute_precision)] |
Returns a period of time in terms of minutes. minute_precision is the number of digits in the minute field; it ranges from 0 to 2. The default is 2. |
INTERVAL '15' MINUTE returns an interval of 15 minutes |
INTERVAL 'time' MINUTE[(minute_precision)] TO SECOND[(fractional_seconds_precision)] |
Returns a period of time in terms of minutes and seconds. minute_precision is the number of digits in the minute field; it ranges from 0 to 2. The default is 2. fractional_seconds_precision is the number of digits in the fractional part of the second value in the time field; it ranges from 0 to 9. |
INTERVAL '15:30' MINUTE TO SECOND returns an interval of 15 minutes and 30 seconds |
INTERVAL 'second' SECOND[(fractional_seconds_precision)] |
Returns a period of time in terms of seconds. fractional_seconds_precision is the number of digits in the fractional part of the second field; it ranges from 0 to 9. The default is 3. |
INTERVAL '15.678' SECOND returns an interval of 15.678 seconds |
Function | Description | Example |
---|---|---|
FIRST_VALUE(value) OVER ([ partition_clause ] order_by_clause [ windowFrame_clause ] ) |
Returns the value evaluated at the row that's the first row of the window frame. | FIRST_VALUE(BANK_ID) OVER (PARTITION BY BANK_ID ORDER BY BANK_NAME ROWS BETWEEN CURRENT ROW AND 1 FOLLOWING) returns the first value of BANK_ID in a window over which the rows are computed as the current row and 1 row after that row, partitioned by BANK_ID and in ascending order of BANK_NAME . |
LAG(value[, offset[, default]]) OVER ([ partition_clause ] order_by_clause) |
Returns the value evaluated at the row at a given offset before the current row within the partition. If there is no such row, the default value is returned. Both offset and default are evaluated with respect to the current row. If omitted, offset is defaulted to 1 and default to NULL. | LAG(BANK_ID, 2, 'hello') OVER (PARTITION BY BANK_ID ORDER BY BANK_NAME DESC) returns the value of BANK_ID from the second row before the current row, partitioned by BANK_ID and in descending order of BANK_NAME . If there is no such value, hello is returned. |
LAST_VALUE(value) OVER ([ partition_clause ] order_by_clause [ windowFrame_clause ]) |
Returns the value evaluated at the row that is the last row of the window frame. | LAST_VALUE(BANK_ID) OVER (PARTITION BY BANK_ID ORDER BY BANK_NAME ROWS BETWEEN CURRENT ROW AND 1 FOLLOWING) returns the last value of BANK_ID in a window over which the rows are computed as the current row and 1 row after that row, partitioned by BANK_ID and in ascending order of BANK_NAME . |
LEAD(value[, offset[, default]]) OVER ([ partition_clause ] order_by_clause) |
Returns the value evaluated at the row at a given offset after the current row within the partition. If there is no such row, the default value is returned. Both offset and default are evaluated with respect to the current row. If omitted, offset is defaulted to 1 and default to NULL. | LEAD(BANK_ID, 2, 'hello') OVER (PARTITION BY BANK_ID ORDER BY BANK_NAME) returns the value of BANK_ID from the second row after the current row, partitioned by BANK_ID and in ascending order of BANK_NAME . If there is no such value, hello is returned. |
RANK() OVER([ partition_clause ] order_by_clause) |
Returns the rank of the current row with gaps, counting from 1. | RANK() OVER (PARTITION BY BANK_ID ORDER BY BANK_NAME) returns the rank of each row within the partition group of BANK_ID , in ascending order of BANK_NAME . |
ROW_NUMBER() OVER([ partition_clause ] order_by_clause) |
Returns the unique number of the current row within its partition, counting from 1. | ROW_NUMBER() OVER (PARTITION BY BANK_ID ORDER BY BANK_NAME) returns the unique row number of each row within the partition group of BANK_ID , in ascending order of BANK_NAME . |
Functions | Description | Example |
---|---|---|
CAST(value AS type) | Returns the specified value in the specified type. | CAST("10" AS INT) returns 10 |
CONCAT(string, string) | Returns the combined values of strings or columns. | CONCAT('Oracle','SQL') returns OracleSQL |
CONCAT_WS(separator, expression1, expression2, expression3,...) |
Returns the combined values of strings or columns using the specified separator in between the strings or columns. A separator is required and it must be a string. At least one expression must be provided after the separator. For example: |
CONCAT_WS('-', 'Hello', 'Oracle') returns Hello-Oracle
If a child of the function is an array, then the array is flattened:
|
INITCAP(string) | Returns the string with the first letter in each word capitalized, while all other letters are in lowercase, and each word is delimited by a white space. | INITCAP('oRACLE sql') returns Oracle Sql |
INSTR(string, substring[start_position]) | Returns the (1-based) index of the first occurrence of substring in string . | INSTR('OracleSQL', 'SQL') returns 7 |
LOWER(string) | Returns the string with all letters changed to lowercase. | LOWER('ORACLE') returns oracle |
LENGTH(string) | Returns the character length of string or the number of bytes of binary data. The length of the string includes trailing spaces. | LENGTH('Oracle') returns 6 |
LTRIM(string) | Returns the string with leading spaces removed from the left. | LTRIM(' Oracle') |
NVL(expr1, epxr2) | Returns the argument that is not null. | NVL(EXPRESSION_3.CUSTOMERS_JSON.CONTINENT_ID, ROWID()) |
REGEXP_SUBSTR(string, regexp[, RegexGroupIdx]) | Searches and extracts the string that matches a regular expression pattern from the input string. If the optional capturing group index is provided, the function extracts the specific group. |
REGEXP_SUBSTR('22;33;44', '([0-9.]*);([0-9.]*);([0-9.]*)', 1) returns 22 |
REPLACE(string, search, replacement) | Replaces all occurrences of search with replacement .If If | REPLACE('ABCabc', 'abc', 'DEF') returns ABCDEF |
RTRIM(string) | Returns the string with leading spaces removed from the right. | RTRIM('Oracle ') |
SUBSTRING(string, position[, substring_length]) | Returns the substring starting at position. | SUBSTRING('Oracle SQL' FROM 2 FOR 3) returns rac |
For numbers, TO_CHAR(expr) and for dates TO_CHAR(expr, format[, locale]) | Converts numbers and dates into strings. For numbers, no format is required. For dates, use the same format as DATE_FORMAT described in Date and Time Functions. Default locale is en-US . See supported language tags.In pipeline expressions, the
| Number example: Date example: |
UPPER(string) | Returns a string with all letters changed to uppercase. | UPPER('oracle') returns ORACLE |
LPAD(str, len[, pad]) | Returns a string that is left-padded with specified characters to a certain length. If the pad character is omitted, the default is a space. | LPAD('ABC', 5, '*') returns '**ABC' |
RPAD(str, len[, pad]) | Returns a string that is right-padded with specified characters to a certain length. If the pad character is omitted, the default is a space. | RPAD('XYZ', 6, '+' ) returns 'XYZ+++' |
Function | Description | Example |
---|---|---|
CASE WHEN condition1 THEN result1 ELSE result2 END | Returns the value for which a condition is met. | CASE WHEN 1 > 0 THEN 'ABC' ELSE 'XYZ' END returns ABC if 1> 0 , otherwise returns XYZ |
AND | The logical AND operator. Returns true if both operands are true, otherwise returns false. | (x = 10 AND y = 20) returns "true" if x is equal to 10 and y is equal to 20. If either one is not true, then it returns "false" |
OR | The logical OR operator. Returns true if either operand is true or both are true, otherwise returns false. | (x = 10 OR y = 20) returns "false" if x is not equal to 10 and also y is not equal to 20. If either one is true, then it returns "true" |
NOT | The logical NOT operator. | |
LIKE | Performs string pattern matching, whether string1 matches the pattern in string2. | |
= | Tests for equality. Returns true if expr1 equals expr2, otherwise returns false. | x = 10 returns "true" when value of x is 10, else it returns "false" |
!= | Tests for inequality. Returns true if expr1 does not equal to expr2, otherwise returns false. | x != 10 returns "false" if value of x is 10, else it returns "true" |
> | Tests for an expression greater than. Returns true if expr1 is greater than expr2. | x > 10 returns "true" if value of x is greater than 10, else it returns "false" |
>= | Tests for an expression greater than or equal to. Returns true if expr1 is greater than or equal to expr2. | x > =10 returns "true" if value of x is greater than or equal to 10, else it returns "false" |
< | Tests for an expression less than. Returns true if expr1 is less than expr2. | x < 10 returns "true" if value of x is less than 10, else it returns "false" |
<= | Tests for an expression less than or equal to. Returns true if expr1 is less than or equal to expr2. | x <= 10 returns "true" if value of x is less than 10, else it returns "false" |
|| | Concatenates two strings. | 'XYZ' || 'hello' returns 'XYZhello' |
BETWEEN | Evaluates a range. | FILTER_1.BANK.BANK_ID BETWEEN 1003 AND 1007 |
IN | Tests whether an expression matches a list of values. | FILTER_2.ORDERS.ORDER_ID IN (1003, 1007) |
Function | Description | Example |
---|---|---|
NUMERIC_ID() | Generates a universally unique identifier that is a 64-bit number for each row. | NUMERIC_ID() returns for example, 3458761969522180096 and 3458762008176885761 |
ROWID() | Generates monotonically increasing 64-bit numbers. | ROWID() returns for example, 0 , 1 , 2 , and so on |
UUID() | Generates a universally unique identifier that is a 128-bit String for each row. | UUID() returns for example, 20d45c2f-0d56-4356-8910-162f4f40fb6d |
MONOTONICALLY_INCREASING_ID() |
Generates unique, monotonically increasing 64-bit integers that are not consecutive numbers. | MONOTONICALLY_INCREASING_ID() returns for example,
8589934592 and 25769803776 |
Function | Description | Example |
---|---|---|
COALESCE(value, value [, value]*) | Returns the first non-null argument, if it exists, otherwise returns null. | COALESCE(NULL, 1, NULL) returns 1 |
NULLIF(value, value) | Returns null if the two values equal each other, otherwise returns the first value. | NULLIF('ABC','XYZ') returns ABC |
Function | Description | Example |
---|---|---|
SCHEMA_OF_JSON(string) | Parses a JSON string and infers the schema in DDL format. |
|
FROM_JSON(column, string) | Parses a column containing a JSON string into one of the following types, with the specified schema.
|
|
TO_JSON(column) | Converts a column containing a type of Struct or Array of Structs, or a Map or Array of Map, into a JSON string. | TO_JSON(TO_STRUCT('s1', TO_ARRAY(1,2,3), 's2', TO_MAP('key', 'value'))) returns a JSON string {"s1":[1,2,3],"s2":{"key":"value"}} |
TO_MAP(string,column[,string,column]*) | Creates a new column of Map type. The input columns must be grouped as key-value pairs. The input key columns cannot be null, and must all have the same data type. The input value columns must all have the same data type. |
|
TO_STRUCT(string,column[,string,column]*) | Creates a new column of Struct type. The input columns must be grouped as key-value pairs. |
|
TO_ARRAY(column[,column]*) | Creates a new column as Array type. The input columns must all have the same data type. |
|
Data flow operators that support creating expressions and hierarchical data types can use higher-order functions.
The supported operators are:
-
Aggregate
-
Expression
-
Filter
-
Join
-
Lookup
-
Split
-
Pivot
Function | Description | Example |
---|---|---|
TRANSFORM(column, lambda_function) | Takes an array and an anonymous function, and sets up a new array by applying the function to each element, and assigning the result to the output array. | For an input array of integers [1, 2, 3] , TRANSFORM(array, x -> x + 1) returns a new array of [2, 3, 4] . |
TRANSFORM_KEYS(column, lambda_function) | Takes a map and a function with 2 arguments (key and value), and returns a map in which the keys have the type of the result of the lambda function, and the values have the type of the column map values. | For an input map with integer keys and string values of {1 -> 'value1', 2 -> 'value2', 3 -> 'value3'} , TRANSFORM_KEYS(map, (k, v) -> k * 2 + 1) returns a new map of {3 -> 'value1', 5 -> 'value2', 7 -> 'value3'} . |
TRANSFORM_VALUES(column, lambda_function) | Takes a map and a function with 2 arguments (key and value), and returns a map in which the values have the type of the result of the lambda functions, and the keys have the type of the column map keys. | For an input map with string keys and string values of {'a' -> 'value1', 'b' -> 'value2', 'c' -> 'value3'} , TRANSFORM_VALUES(map, (k, v) -> k || __ '' || v) returns a new map of {'a' -> 'a_value1', 'b' -> 'b_value2', 'c' -> 'c_value3'} . |
ARRAY_SORT(array(...), lambda_function) |
Only the Expression operator supports Takes an array and sorts according to the given function that takes 2 arguments. The function must return -1, 0, or 1 depending on whether the first element is less than, equal to, or greater than the second element. If the function is omitted, the array is sorted in ascending order. |
The returned array is: [1,5,6] |
Join Operator
Use the join operator to link data from multiple inbound sources.
Use the Condition Builder to visually select elements to create a join condition. You can also enter a join condition manually in the editor.
Creating a join condition lets you select data from two inbound sources based on the condition.
The elements that you can use in a join condition include incoming attributes, parameters, and functions. You can double-click or drag an element from the list to add to the editor to build a condition. You can validate the condition before creating it.
Incoming displays the attributes from the upstream ports connected to the join operator as two separate JOIN folders. View the attributes from each port by expanding or collapsing the appropriate JOIN folder. For example, JOIN_1_1 and JOIN_1_2.
JOIN_1_1.BANK_CUSTOMER.ADDRESS_ID = JOIN_1_2.BANK_ADDRESS.ADDRESS_ID
Parameters are the expression parameters that have been added to the data flow using the Condition Builder (filter, join, lookup, and split operators) or Expression Builder (expression and aggregate operators). An expression parameter has a name, type, and default value. See Adding an Expression Parameter.
Suppose you want to join two sources and retain only the rows where BANK_NAME='ABC Bank'
. You can create a VARCHAR
parameter with the name P_VARCHAR
, and set the default value to ABC BANK
. Then you can create the join expression as:
JOIN_1_1.ADDRESSES.BANK_ID = JOIN_1_2.BANK.BANK_ID AND JOIN_1_2.BANK.BANK_NAME = $P_VARCHAR
Functions are the functions available in Data Integration that you can use in a condition. Functions are operations performed on arguments passed to the function. Functions calculate, manipulate, or extract data values from arguments.
Here's a list of functions that are available for you to add when you construct conditions:
Function | Description | Example |
---|---|---|
MD5(all data types) | Calculates an MD5 checksum of the data type, and returns a string value. | MD5(column_name) |
SHA1(all data types) | Calculates a SHA-1 hash value of the data type, and returns a string value. | SHA1(column_name) |
SHA2(all data types, bitLength) | Calculates a SHA-2 hash value of the data type, and returns a string value. bitLength is an integer. | SHA2 (column_name, bitLength can be set to 0 (equivalent to 256), 256, 384, or 512) . |
ORA_HASH(expr, [max_bucket], [seed_value]) |
Computes a hash value for
Oracle applies the hash function to the combination of |
|
Function | Description | Example |
---|---|---|
ABS(numeric) | Returns the absolute power of the numeric value. | ABS(-1) |
CEIL(numeric) | Returns the smallest integer not greater than the numeric value | CEIL(-1,2) |
FLOOR(numeric) | Returns the largest integer not greater than the numeric value. | FLOOR(-1,2) |
MOD(numeric1, numeric2) | Returns the remainder after numeric1 is divided by numeric2 . | MOD(8,2) |
POWER(numeric1, numeric2) | Raises numeric1 to the power of numeric2 . | POWER(2,3) |
ROUND(numeric1, numeric2) | Returns numeric1 rounded to numeric2 decimal places. | ROUND(2.5,0) |
TRUNC(numeric1, numeric2) | Returns numeric1 truncated to numeric2 decimal places. | TRUNC(2.5,0) |
TO_NUMBER(expr[, format, locale]) | Converts an expr to a number, based on the format and optional locale provided. Default locale is en-US . Supported language tags.Supported format patterns:
|
|
Function | Description | Example |
---|---|---|
CURRENT_DATE |
Returns the current date. | CURRENT_DATE returns today's date such as 2023-05-26 |
CURRENT_TIMESTAMP |
Returns the current date and time for the session time zone. | CURRENT_TIMESTAMP returns today's date and current time such as 2023-05-26 12:34:56
|
DATE_ADD(date, number_of_days) |
Returns the date that's the specified number of days after the specified date . |
DATE_ADD('2017-07-30', 1) returns 2017-07-31 |
DATE_FORMAT(expr, format[, locale]) |
Formats an Supported date format patterns:
|
|
DAYOFMONTH(date) |
Returns the date's day in the month. | DAYOFMONTH('2020-12-25') returns 25 |
DAYOFWEEK(date) |
Returns the date's day in the week. | DAYOFWEEK('2020-12-25') returns 6 for Friday. In the United States, Sunday is considered to be 1, Monday is 2, and so on. |
DAYOFYEAR(date) |
Returns the date's day in the year. | DAYOFYEAR('2020-12-25') returns 360 |
WEEKOFYEAR(date) |
Returns the date's week in the year. |
|
HOUR(datetime) |
Returns the datetime's hour value. | HOUR('2020-12-25 15:10:30') returns 15 |
LAST_DAY(date) |
Returns the date's last day of the month. | LAST_DAY('2020-12-25') returns 31 |
MINUTE(datetime) |
Returns the datetime's minute value. | HOUR('2020-12-25 15:10:30') returns 10 |
MONTH(date) |
Returns the date's month value. | MONTH('2020-06-25') returns 6 |
QUARTER(date) |
Returns the quarter of year the date is in. | QUARTER('2020-12-25') returns 4 |
SECOND(datetime) |
Returns the datetime's second value. | SECOND('2020-12-25 15:10:30') returns 30 |
TO_DATE(string, format_string[, localeStr]) |
Parses the string expression with the format_string expression to a date. Locale is optional. Default is en-US . Supported language tags.In pipeline expressions, the
|
|
TO_TIMESTAMP(expr, format_string[, localeStr]) |
Converts an expr of VARCHAR to a value of TIMESTAMP, based on the format_string and the optional localeStr provided.In pipeline expressions, the
|
TO_TIMESTAMP('2020-10-11 11:10:10', 'yyyy-MM-dd HH:mm:ss') returns a TIMESTAMP object representing 11am 10:10 Oct 11th, 2020 |
WEEK(date) |
Returns the date's week value. |
WEEK('2020-06-25') returns 4 |
YEAR(date) |
Returns the date's year value. | YEAR('2020-06-25') returns 2020 |
ADD_MONTHS(date_expr, number_months) |
Returns the date after adding the specified number of months to the specified date, timestamp or string with a format such as yyyy-MM-dd or yyyy-MM-dd HH:mm:ss.SSS . |
|
MONTHS_BETWEEN(start_date_expr, end_date_expr) |
Returns the number of months between A whole number is returned if both dates are the same day of the month, or both are the last day in their respective months. Otherwise, the difference is calculated based on 31 days per month. |
|
FROM_UTC_TIMESTAMP(time_stamp, time_zone) |
Interprets a date, timestamp or string as a UTC time and converts that time to a timestamp in the specified time zone. For string, use a format such as: Time zone format is either a region-based zone ID (for example, 'area/city' such as 'Asia/Seoul', or a time zone offset (for example, UTC+02). |
FROM_UTC_TIMESTAMP('2017-07-14 02:40:00.0', 'GMT+1') returns 2017-07-14 03:40:00.0 |
TO_UTC_TIMESTAMP(time_stamp, time_zone) |
Converts a date, timestamp or string in the specified time zone to a UTC timestamp. For string, use a format such as: Time zone format is either a region-based zone ID (for example, 'area/city' such as 'Asia/Seoul'), or a time zone offset (for example, UTC+02). |
TO_UTC_TIMESTAMP('2017-07-14 02:40:00.0', 'GMT+1') returns 2017-07-14 01:40:00.0
|
FROM_UNIXTIME(unix_time[, fmt]) |
Converts the specified Unix time or epoch to a string that represents the timestamp of that moment in the current system time zone and in the specified format. Note: Unix time is the number of seconds that have elapsed since January 1, 1970 00:00:00 UTC. If |
Default time zone is PST in the examples |
UNIX_TIMESTAMP([time_expr[, fmt]]) |
Converts the current or specified time to a Unix timestamp in seconds.
If If |
Default time zone is PST in this example |
INTERVAL 'year' YEAR[(year_precision)] |
Returns a period of time in years. year_precision is the number of digits in the year field; it ranges from 0 to 9. If year_precision is omitted, the default is 2 (must be less than 100 years.) |
|
INTERVAL 'year month' YEAR[(year_precision)] TO MONTH |
Returns a period of time in years and months. Use to store a period of time using year and month fields. year_precision is the number of digits in the year field; it ranges from 0 to 9. If year_precision is omitted, the default is 2 (must be less than 100 years.) |
INTERVAL '100-5' YEAR(3) TO MONTH returns an interval of 100 years, 5 months. Must must specify the leading year precision of 3. |
INTERVAL 'month' MONTH[(month_precision)] |
Returns a period of time in months. month_precision is the number of digits in the month field; it ranges from 0 to 9. If month_precision is omitted, the default is 2 (must be less than 100 years.) |
INTERVAL '200' MONTH(3) returns an interval of 200 months. Must specify the month precision of 3. |
INTERVAL 'day time' DAY[(day_precision)] TO SECOND[(fractional_seconds_precision)] |
Returns a period of time in terms of days, hours, minutes, and seconds. day_precision is the number of digits in the day field; it ranges from 0 to 9. The default is 2. fractional_seconds_precision is the number of digits in the fractional part of the second value in the time field; it ranges from 0 to 9. |
INTERVAL '11 10:09:08.555' DAY TO SECOND(3) returns an interval of 11 days, 10 hours, 09 minutes, 08 seconds, and 555 thousandths of a second |
INTERVAL 'day time' DAY[(day_precision)] TO MINUTE[(minute_precision)] |
Returns a period of time in terms of days, hours, and minutes. day_precision is the number of digits in the day field; it ranges from 0 to 9. The default is 2. minute_precision is the number of digits in the minute field; it ranges from 0 to 2. The default is 2. |
INTERVAL '11 10:09' DAY TO MINUTE returns an interval of 11 days, 10 hours, and 09 minutes |
INTERVAL 'day time' DAY[(day_precision)] TO HOUR[(hour_precision)] |
Returns a period of time in terms of days and hours. day_precision is the number of digits in the day field; it ranges from 0 to 9. The default is 2. hour_precision is the number of digits in the hour field; it ranges from 0 to 2. The default is 2. |
INTERVAL '100 10' DAY(3) TO HOUR returns an interval of 100 days 10 hours |
INTERVAL 'day' DAY[(day_precision)] |
Returns a period of time in terms of days. day_precision is the number of digits in the day field; it ranges from 0 to 9. The default is 2. |
INTERVAL '999' DAY(3) returns an interval of 999 days |
INTERVAL 'time' HOUR[(hour_precision)] TO SECOND[(fractional_seconds_precision)] |
Returns a period of time in terms of hours, minutes, and seconds. hour_precision is the number of digits in the hour field; it ranges from 0 to 2. The default is 2. fractional_seconds_precision is the number of digits in the fractional part of the second value in the time field; it ranges from 0 to 9. |
INTERVAL '09:08:07.6666666' HOUR TO SECOND(7) returns an interval of 9 hours, 08 minutes, and 7.6666666 seconds |
INTERVAL 'time' HOUR[(hour_precision)] TO MINUTE[(minute_precision)] |
Returns a period of time in terms of hours and minutes. hour_precision is the number of digits in the hour field; it ranges from 0 to 2. The default is 2. minute_precision is the number of digits in the minute field; it ranges from 0 to 2. The default is 2. |
INTERVAL '09:30' HOUR TO MINUTE returns an interval of 9 hours and 30 minutes |
INTERVAL 'hour' HOUR[(hour_precision)] |
Returns a period of time in terms of hours. hour_precision is the number of digits in the hour field; it ranges from 0 to 2. The default is 2. |
INTERVAL '40' HOUR returns an interval of 40 hours |
INTERVAL 'minute' MINUTE[(minute_precision)] |
Returns a period of time in terms of minutes. minute_precision is the number of digits in the minute field; it ranges from 0 to 2. The default is 2. |
INTERVAL '15' MINUTE returns an interval of 15 minutes |
INTERVAL 'time' MINUTE[(minute_precision)] TO SECOND[(fractional_seconds_precision)] |
Returns a period of time in terms of minutes and seconds. minute_precision is the number of digits in the minute field; it ranges from 0 to 2. The default is 2. fractional_seconds_precision is the number of digits in the fractional part of the second value in the time field; it ranges from 0 to 9. |
INTERVAL '15:30' MINUTE TO SECOND returns an interval of 15 minutes and 30 seconds |
INTERVAL 'second' SECOND[(fractional_seconds_precision)] |
Returns a period of time in terms of seconds. fractional_seconds_precision is the number of digits in the fractional part of the second field; it ranges from 0 to 9. The default is 3. |
INTERVAL '15.678' SECOND returns an interval of 15.678 seconds |
Function | Description | Example |
---|---|---|
FIRST_VALUE(value) OVER ([ partition_clause ] order_by_clause [ windowFrame_clause ] ) |
Returns the value evaluated at the row that's the first row of the window frame. | FIRST_VALUE(BANK_ID) OVER (PARTITION BY BANK_ID ORDER BY BANK_NAME ROWS BETWEEN CURRENT ROW AND 1 FOLLOWING) returns the first value of BANK_ID in a window over which the rows are computed as the current row and 1 row after that row, partitioned by BANK_ID and in ascending order of BANK_NAME . |
LAG(value[, offset[, default]]) OVER ([ partition_clause ] order_by_clause) |
Returns the value evaluated at the row at a given offset before the current row within the partition. If there is no such row, the default value is returned. Both offset and default are evaluated with respect to the current row. If omitted, offset is defaulted to 1 and default to NULL. | LAG(BANK_ID, 2, 'hello') OVER (PARTITION BY BANK_ID ORDER BY BANK_NAME DESC) returns the value of BANK_ID from the second row before the current row, partitioned by BANK_ID and in descending order of BANK_NAME . If there is no such value, hello is returned. |
LAST_VALUE(value) OVER ([ partition_clause ] order_by_clause [ windowFrame_clause ]) |
Returns the value evaluated at the row that is the last row of the window frame. | LAST_VALUE(BANK_ID) OVER (PARTITION BY BANK_ID ORDER BY BANK_NAME ROWS BETWEEN CURRENT ROW AND 1 FOLLOWING) returns the last value of BANK_ID in a window over which the rows are computed as the current row and 1 row after that row, partitioned by BANK_ID and in ascending order of BANK_NAME . |
LEAD(value[, offset[, default]]) OVER ([ partition_clause ] order_by_clause) |
Returns the value evaluated at the row at a given offset after the current row within the partition. If there is no such row, the default value is returned. Both offset and default are evaluated with respect to the current row. If omitted, offset is defaulted to 1 and default to NULL. | LEAD(BANK_ID, 2, 'hello') OVER (PARTITION BY BANK_ID ORDER BY BANK_NAME) returns the value of BANK_ID from the second row after the current row, partitioned by BANK_ID and in ascending order of BANK_NAME . If there is no such value, hello is returned. |
RANK() OVER([ partition_clause ] order_by_clause) |
Returns the rank of the current row with gaps, counting from 1. | RANK() OVER (PARTITION BY BANK_ID ORDER BY BANK_NAME) returns the rank of each row within the partition group of BANK_ID , in ascending order of BANK_NAME . |
ROW_NUMBER() OVER([ partition_clause ] order_by_clause) |
Returns the unique number of the current row within its partition, counting from 1. | ROW_NUMBER() OVER (PARTITION BY BANK_ID ORDER BY BANK_NAME) returns the unique row number of each row within the partition group of BANK_ID , in ascending order of BANK_NAME . |
Functions | Description | Example |
---|---|---|
CAST(value AS type) | Returns the specified value in the specified type. | CAST("10" AS INT) returns 10 |
CONCAT(string, string) | Returns the combined values of strings or columns. | CONCAT('Oracle','SQL') returns OracleSQL |
CONCAT_WS(separator, expression1, expression2, expression3,...) |
Returns the combined values of strings or columns using the specified separator in between the strings or columns. A separator is required and it must be a string. At least one expression must be provided after the separator. For example: |
CONCAT_WS('-', 'Hello', 'Oracle') returns Hello-Oracle
If a child of the function is an array, then the array is flattened:
|
INITCAP(string) | Returns the string with the first letter in each word capitalized, while all other letters are in lowercase, and each word is delimited by a white space. | INITCAP('oRACLE sql') returns Oracle Sql |
INSTR(string, substring[start_position]) | Returns the (1-based) index of the first occurrence of substring in string . | INSTR('OracleSQL', 'SQL') returns 7 |
LOWER(string) | Returns the string with all letters changed to lowercase. | LOWER('ORACLE') returns oracle |
LENGTH(string) | Returns the character length of string or the number of bytes of binary data. The length of the string includes trailing spaces. | LENGTH('Oracle') returns 6 |
LTRIM(string) | Returns the string with leading spaces removed from the left. | LTRIM(' Oracle') |
NVL(expr1, epxr2) | Returns the argument that is not null. | NVL(EXPRESSION_3.CUSTOMERS_JSON.CONTINENT_ID, ROWID()) |
REGEXP_SUBSTR(string, regexp[, RegexGroupIdx]) | Searches and extracts the string that matches a regular expression pattern from the input string. If the optional capturing group index is provided, the function extracts the specific group. |
REGEXP_SUBSTR('22;33;44', '([0-9.]*);([0-9.]*);([0-9.]*)', 1) returns 22 |
REPLACE(string, search, replacement) | Replaces all occurrences of search with replacement .If If | REPLACE('ABCabc', 'abc', 'DEF') returns ABCDEF |
RTRIM(string) | Returns the string with leading spaces removed from the right. | RTRIM('Oracle ') |
SUBSTRING(string, position[, substring_length]) | Returns the substring starting at position. | SUBSTRING('Oracle SQL' FROM 2 FOR 3) returns rac |
For numbers, TO_CHAR(expr) and for dates TO_CHAR(expr, format[, locale]) | Converts numbers and dates into strings. For numbers, no format is required. For dates, use the same format as DATE_FORMAT described in Date and Time Functions. Default locale is en-US . See supported language tags.In pipeline expressions, the
| Number example: Date example: |
UPPER(string) | Returns a string with all letters changed to uppercase. | UPPER('oracle') returns ORACLE |
LPAD(str, len[, pad]) | Returns a string that is left-padded with specified characters to a certain length. If the pad character is omitted, the default is a space. | LPAD('ABC', 5, '*') returns '**ABC' |
RPAD(str, len[, pad]) | Returns a string that is right-padded with specified characters to a certain length. If the pad character is omitted, the default is a space. | RPAD('XYZ', 6, '+' ) returns 'XYZ+++' |
Function | Description | Example |
---|---|---|
CASE WHEN condition1 THEN result1 ELSE result2 END | Returns the value for which a condition is met. | CASE WHEN 1 > 0 THEN 'ABC' ELSE 'XYZ' END returns ABC if 1> 0 , otherwise returns XYZ |
AND | The logical AND operator. Returns true if both operands are true, otherwise returns false. | (x = 10 AND y = 20) returns "true" if x is equal to 10 and y is equal to 20. If either one is not true, then it returns "false" |
OR | The logical OR operator. Returns true if either operand is true or both are true, otherwise returns false. | (x = 10 OR y = 20) returns "false" if x is not equal to 10 and also y is not equal to 20. If either one is true, then it returns "true" |
NOT | The logical NOT operator. | |
LIKE | Performs string pattern matching, whether string1 matches the pattern in string2. | |
= | Tests for equality. Returns true if expr1 equals expr2, otherwise returns false. | x = 10 returns "true" when value of x is 10, else it returns "false" |
!= | Tests for inequality. Returns true if expr1 does not equal to expr2, otherwise returns false. | x != 10 returns "false" if value of x is 10, else it returns "true" |
> | Tests for an expression greater than. Returns true if expr1 is greater than expr2. | x > 10 returns "true" if value of x is greater than 10, else it returns "false" |
>= | Tests for an expression greater than or equal to. Returns true if expr1 is greater than or equal to expr2. | x > =10 returns "true" if value of x is greater than or equal to 10, else it returns "false" |
< | Tests for an expression less than. Returns true if expr1 is less than expr2. | x < 10 returns "true" if value of x is less than 10, else it returns "false" |
<= | Tests for an expression less than or equal to. Returns true if expr1 is less than or equal to expr2. | x <= 10 returns "true" if value of x is less than 10, else it returns "false" |
|| | Concatenates two strings. | 'XYZ' || 'hello' returns 'XYZhello' |
BETWEEN | Evaluates a range. | FILTER_1.BANK.BANK_ID BETWEEN 1003 AND 1007 |
IN | Tests whether an expression matches a list of values. | FILTER_2.ORDERS.ORDER_ID IN (1003, 1007) |
Function | Description | Example |
---|---|---|
NUMERIC_ID() | Generates a universally unique identifier that is a 64-bit number for each row. | NUMERIC_ID() returns for example, 3458761969522180096 and 3458762008176885761 |
ROWID() | Generates monotonically increasing 64-bit numbers. | ROWID() returns for example, 0 , 1 , 2 , and so on |
UUID() | Generates a universally unique identifier that is a 128-bit String for each row. | UUID() returns for example, 20d45c2f-0d56-4356-8910-162f4f40fb6d |
MONOTONICALLY_INCREASING_ID() |
Generates unique, monotonically increasing 64-bit integers that are not consecutive numbers. | MONOTONICALLY_INCREASING_ID() returns for example,
8589934592 and 25769803776 |
Function | Description | Example |
---|---|---|
COALESCE(value, value [, value]*) | Returns the first non-null argument, if it exists, otherwise returns null. | COALESCE(NULL, 1, NULL) returns 1 |
NULLIF(value, value) | Returns null if the two values equal each other, otherwise returns the first value. | NULLIF('ABC','XYZ') returns ABC |
Function | Description | Example |
---|---|---|
SCHEMA_OF_JSON(string) | Parses a JSON string and infers the schema in DDL format. |
|
FROM_JSON(column, string) | Parses a column containing a JSON string into one of the following types, with the specified schema.
|
|
TO_JSON(column) | Converts a column containing a type of Struct or Array of Structs, or a Map or Array of Map, into a JSON string. | TO_JSON(TO_STRUCT('s1', TO_ARRAY(1,2,3), 's2', TO_MAP('key', 'value'))) returns a JSON string {"s1":[1,2,3],"s2":{"key":"value"}} |
TO_MAP(string,column[,string,column]*) | Creates a new column of Map type. The input columns must be grouped as key-value pairs. The input key columns cannot be null, and must all have the same data type. The input value columns must all have the same data type. |
|
TO_STRUCT(string,column[,string,column]*) | Creates a new column of Struct type. The input columns must be grouped as key-value pairs. |
|
TO_ARRAY(column[,column]*) | Creates a new column as Array type. The input columns must all have the same data type. |
|
Data flow operators that support creating expressions and hierarchical data types can use higher-order functions.
The supported operators are:
-
Aggregate
-
Expression
-
Filter
-
Join
-
Lookup
-
Split
-
Pivot
Function | Description | Example |
---|---|---|
TRANSFORM(column, lambda_function) | Takes an array and an anonymous function, and sets up a new array by applying the function to each element, and assigning the result to the output array. | For an input array of integers [1, 2, 3] , TRANSFORM(array, x -> x + 1) returns a new array of [2, 3, 4] . |
TRANSFORM_KEYS(column, lambda_function) | Takes a map and a function with 2 arguments (key and value), and returns a map in which the keys have the type of the result of the lambda function, and the values have the type of the column map values. | For an input map with integer keys and string values of {1 -> 'value1', 2 -> 'value2', 3 -> 'value3'} , TRANSFORM_KEYS(map, (k, v) -> k * 2 + 1) returns a new map of {3 -> 'value1', 5 -> 'value2', 7 -> 'value3'} . |
TRANSFORM_VALUES(column, lambda_function) | Takes a map and a function with 2 arguments (key and value), and returns a map in which the values have the type of the result of the lambda functions, and the keys have the type of the column map keys. | For an input map with string keys and string values of {'a' -> 'value1', 'b' -> 'value2', 'c' -> 'value3'} , TRANSFORM_VALUES(map, (k, v) -> k || __ '' || v) returns a new map of {'a' -> 'a_value1', 'b' -> 'b_value2', 'c' -> 'c_value3'} . |
ARRAY_SORT(array(...), lambda_function) |
Only the Expression operator supports Takes an array and sorts according to the given function that takes 2 arguments. The function must return -1, 0, or 1 depending on whether the first element is less than, equal to, or greater than the second element. If the function is omitted, the array is sorted in ascending order. |
The returned array is: [1,5,6] |
Expression Operator
Use the expression operator to perform one or more transformations on a single row of data to create new, derivative fields.
To use the Expression operator to change the data type of a group of attributes, see Changing the data type of a group of attributes.
You can use the Expression operator to change the data type of attributes using a bulk transformation action.
Use the Expression Builder to visually select elements to build an expression in the editor. You can also manually write an expression yourself.
Take note of the following when you create expressions:
Enclose a string literal within single quotation marks. For example:
CONCAT('We ', 'like')
andUPPER('oracle')
.Enclose an attribute name within double quotation marks. For example:
UPPER("Sales")
andCONCAT(CONCAT("EXPRESSION_1.EMP_UTF8_EN_COL_CSV.EN_NAME", ' '), "EXPRESSION_1.EMP_UTF8_EN_COL_CSV.NAME")
.Using double quotation marks around attribute names is mandatory for multibyte characters, and names that have special characters in the fully qualified name.
The Add Expression panel has two sections: Expression information and Expression builder. The Expression information fields let you specify a name and data type for the expression. You can also create the expression to apply to two or more attributes. When working with complex data types such as Map, Array, and Struct that might have levels of nested types, you can choose to let the builder detect the data type from the expression that you enter. When you let the builder infer the data type, you can preview and refresh the data type, and validate the expression.
The Expression builder section lists the elements for building an expression. The elements that you can use in an expression include incoming attributes, parameters, and functions. Double-click or drag an element from the list to add to the editor to build the expression, or manually write the expression yourself. You can validate the expression before creating it.
Incoming displays the attributes from the upstream operator that are coming into this expression operator. Below the attributes list is a checkbox that lets you apply Exclude rules. To exclude one or more incoming attributes from the output of this expression, select the Exclude incoming attributes checkbox. Then use the menu to add the incoming attributes that you want to exclude from the output. An Exclude rule is applied to each attribute that you select to exclude. You can exclude attributes only when you're adding the expression the first time. When you edit the expression, the Exclude incoming attributes checkbox is unavailable.
Parameters include user-defined parameters and system-generated parameters.
User-defined parameters are the expression
parameters that have been added in the data flow using the Condition Builder
(filter, join, lookup, and split operators) or Expression Builder (expression and
aggregate operators). See Adding an Expression Parameter. The syntax
is $PARAMETER_NAME
. For example:
EXPRESSION_1.ADDRESSES.POSTAL_CODE=$P_CODE
Data Integration generates system parameters such as
SYS.TASK_START_TIME
. The values of system parameters can be used in
expressions to log system information. The syntax is
${SYSTEM_PARAMETER}
. For example:
${SYS.TASK_RUN_NAME}
Functions are the functions available in Data Integration that you can use in an expression. Functions are operations performed on arguments passed to the function. Functions calculate, manipulate, or extract data values from arguments. You can also add user defined functions that you have created in the workspace. For example: MYLIBRARY.MYFUNCTION
The list of functions available in Data Integration for you to use is as follows:
Function | Description | Example |
---|---|---|
MD5(all data types) | Calculates an MD5 checksum of the data type, and returns a string value. | MD5(column_name) |
SHA1(all data types) | Calculates a SHA-1 hash value of the data type, and returns a string value. | SHA1(column_name) |
SHA2(all data types, bitLength) | Calculates a SHA-2 hash value of the data type, and returns a string value. bitLength is an integer. | SHA2 (column_name, bitLength can be set to 0 (equivalent to 256), 256, 384, or 512) . |
ORA_HASH(expr, [max_bucket], [seed_value]) |
Computes a hash value for
Oracle applies the hash function to the combination of |
|
Function | Description | Example |
---|---|---|
ABS(numeric) | Returns the absolute power of the numeric value. | ABS(-1) |
CEIL(numeric) | Returns the smallest integer not greater than the numeric value | CEIL(-1,2) |
FLOOR(numeric) | Returns the largest integer not greater than the numeric value. | FLOOR(-1,2) |
MOD(numeric1, numeric2) | Returns the remainder after numeric1 is divided by numeric2 . | MOD(8,2) |
POWER(numeric1, numeric2) | Raises numeric1 to the power of numeric2 . | POWER(2,3) |
ROUND(numeric1, numeric2) | Returns numeric1 rounded to numeric2 decimal places. | ROUND(2.5,0) |
TRUNC(numeric1, numeric2) | Returns numeric1 truncated to numeric2 decimal places. | TRUNC(2.5,0) |
TO_NUMBER(expr[, format, locale]) | Converts an expr to a number, based on the format and optional locale provided. Default locale is en-US . Supported language tags.Supported format patterns:
|
|
Only the Expression operator supports array functions.
Function | Description | Example |
---|---|---|
ARRAY_POSITION(array(...), element) | Returns the position of the first occurrence of the given element in the given array. The position is not zero based, instead it starts with 1. | ARRAY_POSITION(array(3, 2, 1, 4, 1), 1) returns 3 |
REVERSE(array(...)) |
Returns the given array of elements in a reverse order. | REVERSE(array(2, 1, 4, 3)) returns [3,4,1,2] |
ELEMENT_AT(array(...), index) |
Returns the element of the given array at the given index position. The index is not zero based, instead it starts with 1. If |
ELEMENT_AT(array(1, 2, 3), 2) returns 2 |
Function | Description | Example |
---|---|---|
CURRENT_DATE |
Returns the current date. | CURRENT_DATE returns today's date such as 2023-05-26 |
CURRENT_TIMESTAMP |
Returns the current date and time for the session time zone. | CURRENT_TIMESTAMP returns today's date and current time such as 2023-05-26 12:34:56
|
DATE_ADD(date, number_of_days) |
Returns the date that's the specified number of days after the specified date . |
DATE_ADD('2017-07-30', 1) returns 2017-07-31 |
DATE_FORMAT(expr, format[, locale]) |
Formats an Supported date format patterns:
|
|
DAYOFMONTH(date) |
Returns the date's day in the month. | DAYOFMONTH('2020-12-25') returns 25 |
DAYOFWEEK(date) |
Returns the date's day in the week. | DAYOFWEEK('2020-12-25') returns 6 for Friday. In the United States, Sunday is considered to be 1, Monday is 2, and so on. |
DAYOFYEAR(date) |
Returns the date's day in the year. | DAYOFYEAR('2020-12-25') returns 360 |
WEEKOFYEAR(date) |
Returns the date's week in the year. |
|
HOUR(datetime) |
Returns the datetime's hour value. | HOUR('2020-12-25 15:10:30') returns 15 |
LAST_DAY(date) |
Returns the date's last day of the month. | LAST_DAY('2020-12-25') returns 31 |
MINUTE(datetime) |
Returns the datetime's minute value. | HOUR('2020-12-25 15:10:30') returns 10 |
MONTH(date) |
Returns the date's month value. | MONTH('2020-06-25') returns 6 |
QUARTER(date) |
Returns the quarter of year the date is in. | QUARTER('2020-12-25') returns 4 |
SECOND(datetime) |
Returns the datetime's second value. | SECOND('2020-12-25 15:10:30') returns 30 |
TO_DATE(string, format_string[, localeStr]) |
Parses the string expression with the format_string expression to a date. Locale is optional. Default is en-US . Supported language tags.In pipeline expressions, the
|
|
TO_TIMESTAMP(expr, format_string[, localeStr]) |
Converts an expr of VARCHAR to a value of TIMESTAMP, based on the format_string and the optional localeStr provided.In pipeline expressions, the
|
TO_TIMESTAMP('2020-10-11 11:10:10', 'yyyy-MM-dd HH:mm:ss') returns a TIMESTAMP object representing 11am 10:10 Oct 11th, 2020 |
WEEK(date) |
Returns the date's week value. |
WEEK('2020-06-25') returns 4 |
YEAR(date) |
Returns the date's year value. | YEAR('2020-06-25') returns 2020 |
ADD_MONTHS(date_expr, number_months) |
Returns the date after adding the specified number of months to the specified date, timestamp or string with a format such as yyyy-MM-dd or yyyy-MM-dd HH:mm:ss.SSS . |
|
MONTHS_BETWEEN(start_date_expr, end_date_expr) |
Returns the number of months between A whole number is returned if both dates are the same day of the month, or both are the last day in their respective months. Otherwise, the difference is calculated based on 31 days per month. |
|
FROM_UTC_TIMESTAMP(time_stamp, time_zone) |
Interprets a date, timestamp or string as a UTC time and converts that time to a timestamp in the specified time zone. For string, use a format such as: Time zone format is either a region-based zone ID (for example, 'area/city' such as 'Asia/Seoul', or a time zone offset (for example, UTC+02). |
FROM_UTC_TIMESTAMP('2017-07-14 02:40:00.0', 'GMT+1') returns 2017-07-14 03:40:00.0 |
TO_UTC_TIMESTAMP(time_stamp, time_zone) |
Converts a date, timestamp or string in the specified time zone to a UTC timestamp. For string, use a format such as: Time zone format is either a region-based zone ID (for example, 'area/city' such as 'Asia/Seoul'), or a time zone offset (for example, UTC+02). |
TO_UTC_TIMESTAMP('2017-07-14 02:40:00.0', 'GMT+1') returns 2017-07-14 01:40:00.0
|
FROM_UNIXTIME(unix_time[, fmt]) |
Converts the specified Unix time or epoch to a string that represents the timestamp of that moment in the current system time zone and in the specified format. Note: Unix time is the number of seconds that have elapsed since January 1, 1970 00:00:00 UTC. If |
Default time zone is PST in the examples |
UNIX_TIMESTAMP([time_expr[, fmt]]) |
Converts the current or specified time to a Unix timestamp in seconds.
If If |
Default time zone is PST in this example |
INTERVAL 'year' YEAR[(year_precision)] |
Returns a period of time in years. year_precision is the number of digits in the year field; it ranges from 0 to 9. If year_precision is omitted, the default is 2 (must be less than 100 years.) |
|
INTERVAL 'year month' YEAR[(year_precision)] TO MONTH |
Returns a period of time in years and months. Use to store a period of time using year and month fields. year_precision is the number of digits in the year field; it ranges from 0 to 9. If year_precision is omitted, the default is 2 (must be less than 100 years.) |
INTERVAL '100-5' YEAR(3) TO MONTH returns an interval of 100 years, 5 months. Must must specify the leading year precision of 3. |
INTERVAL 'month' MONTH[(month_precision)] |
Returns a period of time in months. month_precision is the number of digits in the month field; it ranges from 0 to 9. If month_precision is omitted, the default is 2 (must be less than 100 years.) |
INTERVAL '200' MONTH(3) returns an interval of 200 months. Must specify the month precision of 3. |
INTERVAL 'day time' DAY[(day_precision)] TO SECOND[(fractional_seconds_precision)] |
Returns a period of time in terms of days, hours, minutes, and seconds. day_precision is the number of digits in the day field; it ranges from 0 to 9. The default is 2. fractional_seconds_precision is the number of digits in the fractional part of the second value in the time field; it ranges from 0 to 9. |
INTERVAL '11 10:09:08.555' DAY TO SECOND(3) returns an interval of 11 days, 10 hours, 09 minutes, 08 seconds, and 555 thousandths of a second |
INTERVAL 'day time' DAY[(day_precision)] TO MINUTE[(minute_precision)] |
Returns a period of time in terms of days, hours, and minutes. day_precision is the number of digits in the day field; it ranges from 0 to 9. The default is 2. minute_precision is the number of digits in the minute field; it ranges from 0 to 2. The default is 2. |
INTERVAL '11 10:09' DAY TO MINUTE returns an interval of 11 days, 10 hours, and 09 minutes |
INTERVAL 'day time' DAY[(day_precision)] TO HOUR[(hour_precision)] |
Returns a period of time in terms of days and hours. day_precision is the number of digits in the day field; it ranges from 0 to 9. The default is 2. hour_precision is the number of digits in the hour field; it ranges from 0 to 2. The default is 2. |
INTERVAL '100 10' DAY(3) TO HOUR returns an interval of 100 days 10 hours |
INTERVAL 'day' DAY[(day_precision)] |
Returns a period of time in terms of days. day_precision is the number of digits in the day field; it ranges from 0 to 9. The default is 2. |
INTERVAL '999' DAY(3) returns an interval of 999 days |
INTERVAL 'time' HOUR[(hour_precision)] TO SECOND[(fractional_seconds_precision)] |
Returns a period of time in terms of hours, minutes, and seconds. hour_precision is the number of digits in the hour field; it ranges from 0 to 2. The default is 2. fractional_seconds_precision is the number of digits in the fractional part of the second value in the time field; it ranges from 0 to 9. |
INTERVAL '09:08:07.6666666' HOUR TO SECOND(7) returns an interval of 9 hours, 08 minutes, and 7.6666666 seconds |
INTERVAL 'time' HOUR[(hour_precision)] TO MINUTE[(minute_precision)] |
Returns a period of time in terms of hours and minutes. hour_precision is the number of digits in the hour field; it ranges from 0 to 2. The default is 2. minute_precision is the number of digits in the minute field; it ranges from 0 to 2. The default is 2. |
INTERVAL '09:30' HOUR TO MINUTE returns an interval of 9 hours and 30 minutes |
INTERVAL 'hour' HOUR[(hour_precision)] |
Returns a period of time in terms of hours. hour_precision is the number of digits in the hour field; it ranges from 0 to 2. The default is 2. |
INTERVAL '40' HOUR returns an interval of 40 hours |
INTERVAL 'minute' MINUTE[(minute_precision)] |
Returns a period of time in terms of minutes. minute_precision is the number of digits in the minute field; it ranges from 0 to 2. The default is 2. |
INTERVAL '15' MINUTE returns an interval of 15 minutes |
INTERVAL 'time' MINUTE[(minute_precision)] TO SECOND[(fractional_seconds_precision)] |
Returns a period of time in terms of minutes and seconds. minute_precision is the number of digits in the minute field; it ranges from 0 to 2. The default is 2. fractional_seconds_precision is the number of digits in the fractional part of the second value in the time field; it ranges from 0 to 9. |
INTERVAL '15:30' MINUTE TO SECOND returns an interval of 15 minutes and 30 seconds |
INTERVAL 'second' SECOND[(fractional_seconds_precision)] |
Returns a period of time in terms of seconds. fractional_seconds_precision is the number of digits in the fractional part of the second field; it ranges from 0 to 9. The default is 3. |
INTERVAL '15.678' SECOND returns an interval of 15.678 seconds |
Function | Description | Example |
---|---|---|
FIRST_VALUE(value) OVER ([ partition_clause ] order_by_clause [ windowFrame_clause ] ) |
Returns the value evaluated at the row that's the first row of the window frame. | FIRST_VALUE(BANK_ID) OVER (PARTITION BY BANK_ID ORDER BY BANK_NAME ROWS BETWEEN CURRENT ROW AND 1 FOLLOWING) returns the first value of BANK_ID in a window over which the rows are computed as the current row and 1 row after that row, partitioned by BANK_ID and in ascending order of BANK_NAME . |
LAG(value[, offset[, default]]) OVER ([ partition_clause ] order_by_clause) |
Returns the value evaluated at the row at a given offset before the current row within the partition. If there is no such row, the default value is returned. Both offset and default are evaluated with respect to the current row. If omitted, offset is defaulted to 1 and default to NULL. | LAG(BANK_ID, 2, 'hello') OVER (PARTITION BY BANK_ID ORDER BY BANK_NAME DESC) returns the value of BANK_ID from the second row before the current row, partitioned by BANK_ID and in descending order of BANK_NAME . If there is no such value, hello is returned. |
LAST_VALUE(value) OVER ([ partition_clause ] order_by_clause [ windowFrame_clause ]) |
Returns the value evaluated at the row that is the last row of the window frame. | LAST_VALUE(BANK_ID) OVER (PARTITION BY BANK_ID ORDER BY BANK_NAME ROWS BETWEEN CURRENT ROW AND 1 FOLLOWING) returns the last value of BANK_ID in a window over which the rows are computed as the current row and 1 row after that row, partitioned by BANK_ID and in ascending order of BANK_NAME . |
LEAD(value[, offset[, default]]) OVER ([ partition_clause ] order_by_clause) |
Returns the value evaluated at the row at a given offset after the current row within the partition. If there is no such row, the default value is returned. Both offset and default are evaluated with respect to the current row. If omitted, offset is defaulted to 1 and default to NULL. | LEAD(BANK_ID, 2, 'hello') OVER (PARTITION BY BANK_ID ORDER BY BANK_NAME) returns the value of BANK_ID from the second row after the current row, partitioned by BANK_ID and in ascending order of BANK_NAME . If there is no such value, hello is returned. |
RANK() OVER([ partition_clause ] order_by_clause) |
Returns the rank of the current row with gaps, counting from 1. | RANK() OVER (PARTITION BY BANK_ID ORDER BY BANK_NAME) returns the rank of each row within the partition group of BANK_ID , in ascending order of BANK_NAME . |
ROW_NUMBER() OVER([ partition_clause ] order_by_clause) |
Returns the unique number of the current row within its partition, counting from 1. | ROW_NUMBER() OVER (PARTITION BY BANK_ID ORDER BY BANK_NAME) returns the unique row number of each row within the partition group of BANK_ID , in ascending order of BANK_NAME . |
Functions | Description | Example |
---|---|---|
CAST(value AS type) | Returns the specified value in the specified type. | CAST("10" AS INT) returns 10 |
CONCAT(string, string) | Returns the combined values of strings or columns. | CONCAT('Oracle','SQL') returns OracleSQL |
CONCAT_WS(separator, expression1, expression2, expression3,...) |
Returns the combined values of strings or columns using the specified separator in between the strings or columns. A separator is required and it must be a string. At least one expression must be provided after the separator. For example: |
CONCAT_WS('-', 'Hello', 'Oracle') returns Hello-Oracle
If a child of the function is an array, then the array is flattened:
|
INITCAP(string) | Returns the string with the first letter in each word capitalized, while all other letters are in lowercase, and each word is delimited by a white space. | INITCAP('oRACLE sql') returns Oracle Sql |
INSTR(string, substring[start_position]) | Returns the (1-based) index of the first occurrence of substring in string . | INSTR('OracleSQL', 'SQL') returns 7 |
LOWER(string) | Returns the string with all letters changed to lowercase. | LOWER('ORACLE') returns oracle |
LENGTH(string) | Returns the character length of string or the number of bytes of binary data. The length of the string includes trailing spaces. | LENGTH('Oracle') returns 6 |
LTRIM(string) | Returns the string with leading spaces removed from the left. | LTRIM(' Oracle') |
NVL(expr1, epxr2) | Returns the argument that is not null. | NVL(EXPRESSION_3.CUSTOMERS_JSON.CONTINENT_ID, ROWID()) |
REGEXP_SUBSTR(string, regexp[, RegexGroupIdx]) | Searches and extracts the string that matches a regular expression pattern from the input string. If the optional capturing group index is provided, the function extracts the specific group. |
REGEXP_SUBSTR('22;33;44', '([0-9.]*);([0-9.]*);([0-9.]*)', 1) returns 22 |
REPLACE(string, search, replacement) | Replaces all occurrences of search with replacement .If If | REPLACE('ABCabc', 'abc', 'DEF') returns ABCDEF |
RTRIM(string) | Returns the string with leading spaces removed from the right. | RTRIM('Oracle ') |
SUBSTRING(string, position[, substring_length]) | Returns the substring starting at position. | SUBSTRING('Oracle SQL' FROM 2 FOR 3) returns rac |
For numbers, TO_CHAR(expr) and for dates TO_CHAR(expr, format[, locale]) | Converts numbers and dates into strings. For numbers, no format is required. For dates, use the same format as DATE_FORMAT described in Date and Time Functions. Default locale is en-US . See supported language tags.In pipeline expressions, the
| Number example: Date example: |
UPPER(string) | Returns a string with all letters changed to uppercase. | UPPER('oracle') returns ORACLE |
LPAD(str, len[, pad]) | Returns a string that is left-padded with specified characters to a certain length. If the pad character is omitted, the default is a space. | LPAD('ABC', 5, '*') returns '**ABC' |
RPAD(str, len[, pad]) | Returns a string that is right-padded with specified characters to a certain length. If the pad character is omitted, the default is a space. | RPAD('XYZ', 6, '+' ) returns 'XYZ+++' |
Function | Description | Example |
---|---|---|
CASE WHEN condition1 THEN result1 ELSE result2 END | Returns the value for which a condition is met. | CASE WHEN 1 > 0 THEN 'ABC' ELSE 'XYZ' END returns ABC if 1> 0 , otherwise returns XYZ |
AND | The logical AND operator. Returns true if both operands are true, otherwise returns false. | (x = 10 AND y = 20) returns "true" if x is equal to 10 and y is equal to 20. If either one is not true, then it returns "false" |
OR | The logical OR operator. Returns true if either operand is true or both are true, otherwise returns false. | (x = 10 OR y = 20) returns "false" if x is not equal to 10 and also y is not equal to 20. If either one is true, then it returns "true" |
NOT | The logical NOT operator. | |
LIKE | Performs string pattern matching, whether string1 matches the pattern in string2. | |
= | Tests for equality. Returns true if expr1 equals expr2, otherwise returns false. | x = 10 returns "true" when value of x is 10, else it returns "false" |
!= | Tests for inequality. Returns true if expr1 does not equal to expr2, otherwise returns false. | x != 10 returns "false" if value of x is 10, else it returns "true" |
> | Tests for an expression greater than. Returns true if expr1 is greater than expr2. | x > 10 returns "true" if value of x is greater than 10, else it returns "false" |
>= | Tests for an expression greater than or equal to. Returns true if expr1 is greater than or equal to expr2. | x > =10 returns "true" if value of x is greater than or equal to 10, else it returns "false" |
< | Tests for an expression less than. Returns true if expr1 is less than expr2. | x < 10 returns "true" if value of x is less than 10, else it returns "false" |
<= | Tests for an expression less than or equal to. Returns true if expr1 is less than or equal to expr2. | x <= 10 returns "true" if value of x is less than 10, else it returns "false" |
|| | Concatenates two strings. | 'XYZ' || 'hello' returns 'XYZhello' |
BETWEEN | Evaluates a range. | FILTER_1.BANK.BANK_ID BETWEEN 1003 AND 1007 |
IN | Tests whether an expression matches a list of values. | FILTER_2.ORDERS.ORDER_ID IN (1003, 1007) |
Function | Description | Example |
---|---|---|
NUMERIC_ID() | Generates a universally unique identifier that is a 64-bit number for each row. | NUMERIC_ID() returns for example, 3458761969522180096 and 3458762008176885761 |
ROWID() | Generates monotonically increasing 64-bit numbers. | ROWID() returns for example, 0 , 1 , 2 , and so on |
UUID() | Generates a universally unique identifier that is a 128-bit String for each row. | UUID() returns for example, 20d45c2f-0d56-4356-8910-162f4f40fb6d |
MONOTONICALLY_INCREASING_ID() |
Generates unique, monotonically increasing 64-bit integers that are not consecutive numbers. | MONOTONICALLY_INCREASING_ID() returns for example,
8589934592 and 25769803776 |
Function | Description | Example |
---|---|---|
COALESCE(value, value [, value]*) | Returns the first non-null argument, if it exists, otherwise returns null. | COALESCE(NULL, 1, NULL) returns 1 |
NULLIF(value, value) | Returns null if the two values equal each other, otherwise returns the first value. | NULLIF('ABC','XYZ') returns ABC |
Function | Description | Example |
---|---|---|
SCHEMA_OF_JSON(string) | Parses a JSON string and infers the schema in DDL format. |
|
FROM_JSON(column, string) | Parses a column containing a JSON string into one of the following types, with the specified schema.
|
|
TO_JSON(column) | Converts a column containing a type of Struct or Array of Structs, or a Map or Array of Map, into a JSON string. | TO_JSON(TO_STRUCT('s1', TO_ARRAY(1,2,3), 's2', TO_MAP('key', 'value'))) returns a JSON string {"s1":[1,2,3],"s2":{"key":"value"}} |
TO_MAP(string,column[,string,column]*) | Creates a new column of Map type. The input columns must be grouped as key-value pairs. The input key columns cannot be null, and must all have the same data type. The input value columns must all have the same data type. |
|
TO_STRUCT(string,column[,string,column]*) | Creates a new column of Struct type. The input columns must be grouped as key-value pairs. |
|
TO_ARRAY(column[,column]*) | Creates a new column as Array type. The input columns must all have the same data type. |
|
Data flow operators that support creating expressions and hierarchical data types can use higher-order functions.
The supported operators are:
-
Aggregate
-
Expression
-
Filter
-
Join
-
Lookup
-
Split
-
Pivot
Function | Description | Example |
---|---|---|
TRANSFORM(column, lambda_function) | Takes an array and an anonymous function, and sets up a new array by applying the function to each element, and assigning the result to the output array. | For an input array of integers [1, 2, 3] , TRANSFORM(array, x -> x + 1) returns a new array of [2, 3, 4] . |
TRANSFORM_KEYS(column, lambda_function) | Takes a map and a function with 2 arguments (key and value), and returns a map in which the keys have the type of the result of the lambda function, and the values have the type of the column map values. | For an input map with integer keys and string values of {1 -> 'value1', 2 -> 'value2', 3 -> 'value3'} , TRANSFORM_KEYS(map, (k, v) -> k * 2 + 1) returns a new map of {3 -> 'value1', 5 -> 'value2', 7 -> 'value3'} . |
TRANSFORM_VALUES(column, lambda_function) | Takes a map and a function with 2 arguments (key and value), and returns a map in which the values have the type of the result of the lambda functions, and the keys have the type of the column map keys. | For an input map with string keys and string values of {'a' -> 'value1', 'b' -> 'value2', 'c' -> 'value3'} , TRANSFORM_VALUES(map, (k, v) -> k || __ '' || v) returns a new map of {'a' -> 'a_value1', 'b' -> 'b_value2', 'c' -> 'c_value3'} . |
ARRAY_SORT(array(...), lambda_function) |
Only the Expression operator supports Takes an array and sorts according to the given function that takes 2 arguments. The function must return -1, 0, or 1 depending on whether the first element is less than, equal to, or greater than the second element. If the function is omitted, the array is sorted in ascending order. |
The returned array is: [1,5,6] |
You can duplicate expressions that have been added to an expression operator.
You can change the order of expressions that have been added to an expression operator.
Aggregate Operator
Use the aggregate operator to perform calculations such as sum or count, on all rows or a group of rows to create new, derivative attributes.
Use the Expression Builder to visually select elements to build an expression in the editor. You can also manually write an expression yourself.
The Add Expression panel has two sections: Expression information and Expression builder. The Expression information fields let you specify a name and data type for the expression. You can also create the expression to apply to two or more attributes. When working with complex data types such as Map, Array, and Struct that might have levels of nested types, you can choose to let the builder detect the data type from the expression that you enter. When you let the builder infer the data type, you can preview and refresh the data type, and validate the expression.
The Expression builder section lists the elements for building an expression. The elements that you can use in an expression include incoming attributes, parameters, and functions. Double-click or drag an element from the list to add to the editor to build the expression, or manually write the expression yourself. You can validate the expression before creating it.
Incoming displays the attributes from the upstream operator that are coming into this expression operator. Below the attributes list is a checkbox that lets you apply Exclude rules. To exclude one or more incoming attributes from the output of this expression, select the Exclude incoming attributes checkbox. Then use the menu to add the incoming attributes that you want to exclude from the output. An Exclude rule is applied to each attribute that you select to exclude. You can exclude attributes only when you're adding the expression the first time. When you edit the expression, the Exclude incoming attributes checkbox is unavailable.
Parameters are the expression parameters that have been added to the data flow using the Condition Builder (filter, join, lookup, and split operators) or Expression Builder (expression and aggregate operators). An expression parameter has a name, type, and default value. See Adding an Expression Parameter.
Functions are operations performed on arguments passed to the function. Functions calculate, manipulate, or extract data values from arguments. Here's a list of functions available to use to build the expressions:
Function | Description | Example |
---|---|---|
COUNT(value[, value]*) | Returns the number of rows for which one or more supplied expressions are all non-null. | COUNT(expr1) |
COUNT(*) | Returns the total number of retrieved rows, including rows containing null. | COUNT(*) |
MAX(value) | Returns the maximum value of the argument. | MAX(expr) |
MIN(value) | Returns the minimum value of the argument. | MIN(expr) |
SUM(numeric) | Returns the sum calculated from values of a group. | SUM(expr1) |
AVG(numeric) | Returns the average of numeric values in an expression. | AVG(AGGREGATE_1.src1.attribute1) |
LISTAGG(column[, delimiter]) WITHIN GROUP (order_by_clause) | Concatenates values of the input column with the specified delimiter, for each group based on the order clause. column contains the values you want to concatenate together in the result. The delimiter separates the column values in the result. If a delimiter is not provided, then an empty character is used. order_by_clause determines the order that the concatenated values are returned. This function can only be used as an aggregator, and can be used with grouping or without grouping. If you use without grouping, the result is a single row. If you use with a grouping, the function returns a row for each group. | Consider a table with two columns,
Example 1: Without grouping
Example 2: Group by the
|
Data flow operators that support creating expressions and hierarchical data types can use higher-order functions.
The supported operators are:
-
Aggregate
-
Expression
-
Filter
-
Join
-
Lookup
-
Split
-
Pivot
Function | Description | Example |
---|---|---|
TRANSFORM(column, lambda_function) | Takes an array and an anonymous function, and sets up a new array by applying the function to each element, and assigning the result to the output array. | For an input array of integers [1, 2, 3] , TRANSFORM(array, x -> x + 1) returns a new array of [2, 3, 4] . |
TRANSFORM_KEYS(column, lambda_function) | Takes a map and a function with 2 arguments (key and value), and returns a map in which the keys have the type of the result of the lambda function, and the values have the type of the column map values. | For an input map with integer keys and string values of {1 -> 'value1', 2 -> 'value2', 3 -> 'value3'} , TRANSFORM_KEYS(map, (k, v) -> k * 2 + 1) returns a new map of {3 -> 'value1', 5 -> 'value2', 7 -> 'value3'} . |
TRANSFORM_VALUES(column, lambda_function) | Takes a map and a function with 2 arguments (key and value), and returns a map in which the values have the type of the result of the lambda functions, and the keys have the type of the column map keys. | For an input map with string keys and string values of {'a' -> 'value1', 'b' -> 'value2', 'c' -> 'value3'} , TRANSFORM_VALUES(map, (k, v) -> k || __ '' || v) returns a new map of {'a' -> 'a_value1', 'b' -> 'b_value2', 'c' -> 'c_value3'} . |
ARRAY_SORT(array(...), lambda_function) |
Only the Expression operator supports Takes an array and sorts according to the given function that takes 2 arguments. The function must return -1, 0, or 1 depending on whether the first element is less than, equal to, or greater than the second element. If the function is omitted, the array is sorted in ascending order. |
The returned array is: [1,5,6] |
Distinct Operator
Use the distinct operator to return distinct rows with unique values.
Sort Operator
Use the sort operator to perform sorting of data in ascending or descending order.
When using a sort operator, we recommend that you apply the sort operator after you apply other operators. This ensures that the sort operator stays immediately before the target operator, therefore enabling data to be inserted into the target in a specific order.
When using a sort operator, apply the sort operator after other shaping operators, and before the operator that requires data to be sorted.
For example, apply the sort operator before a target operator to insert data in the target in a specific sort order.
After you add a sort operator on the canvas, and connect it with another operator, you can add a sort condition.
From the Add Sort Condition panel, you can select attributes to sort from the fields listed, or filter attribute names using patterns. For string data types, the sort operation occurs based on the lexicographic order.
To add sort conditions:
- Under the Details tab, go to the Sort
conditions section, and click Add.
The Add Sort Condition panel displays all the attribute fields fetched from the source table.
- In the Add sort condition panel, select the attribute to sort the data by.
- To filter attributes using name patterns, enter a name pattern, for example, *CITY*.
- For Sort order, select Ascending or
Descending, and then click
Add.
Each condition you added is displayed in the sort condition list.
Note
You can add multiple sort conditions one-by-one. Sort operates based on the order of sort conditions in the list. For example, the sorting happens first based on the first condition in the list and then the sorted data is re-sorted based on the second condition and so on.
Move the sort conditions up or down in the list to prioritize the sort.
Moving sort conditions up or down lets you sort first by a high priority sort condition, and then re-sort the sorted data by the next condition in the list. For example, to sort first by address and then by the postal code, move the sort condition with address to the top.
To prioritize sort conditions:
To edit sort conditions:
You can delete sort conditions one by one or you can do a bulk delete.
To delete sort conditions:
- Under the Details tab, go to the Sort conditions section.
- To delete sort conditions one by one, click the Actions menu () corresponding to the sort condition you want to delete.
- To delete multiple sort conditions, select the checkboxes corresponding to each, and then click Delete at the top of the list.
Union Operator
Use the union operator to perform a union operation between two or more datasets.
You can perform a union operation on up to ten source operators. You must configure at least two source inputs. You can choose to perform the union operation by matching attribute names across the source input attributes, or you can match the attributes by their attribute position.
Consider the following two data entity examples. Data entity 1 is set as the primary input. The resulting data entity shows how the data from the two inputs are combined during a union by name operation. The resulting data entity uses the attribute name, order, and data type from the primary input data entity.
Data Entity 1, primary input
Department | LOCATION-ID | Warehouse |
---|---|---|
IT | 1400 | San Francisco |
Shipping | 1500 | Southlake, Texas |
Accounting | 1700 | New Jersey |
Data Entity 2
Warehouse | LOCATION-ID | Department |
---|---|---|
Denver | 1600 | Benefits |
New York | 1400 | Construction |
Resulting Data Entity
Department | LOCATION-ID | Warehouse |
---|---|---|
IT | 1400 | San Francisco |
Shipping | 1500 | Southlake, Texas |
Accounting | 1700 | New Jersey |
Benefits | 1600 | Denver |
Construction | 1400 | New York |
Consider the following two data entity examples. Data entity 2 is set as the primary input. The resulting data entity shows how the data from the two inputs are combined during a union by position operation. The resulting data entity uses the attribute name, order, and data type from the primary input data entity.
Data Entity 1
Department | LOCATION-ID | Warehouse |
---|---|---|
IT | 1400 | San Francisco |
Shipping | 1500 | Southlake, Texas |
Accounting | 1700 | New Jersey |
Data Entity 2, primary input
Warehouse | LOCATION-ID | Department |
---|---|---|
Denver | 1600 | Benefits |
New York | 1400 | Construction |
Resulting Data Entity
Warehouse | LOCATION-ID | Department |
---|---|---|
Denver | 1600 | Benefits |
New York | 1400 | Construction |
IT | 1400 | San Francisco |
Shipping | 1500 | Southlake, Texas |
Accounting | 1700 | New Jersey |
Minus Operator
Use the minus operator to compare two data entities and return the rows that are present in one entity but not present in the other entity.
You can choose to keep or eliminate duplicate rows in the resulting data.
You can perform a minus operation on two source operators only. You can choose to perform the minus operation by matching attribute names across the source input attributes, or you can match the attributes by their attribute position.
Consider the following two data entity examples. Data entity 1 is set as the primary input. The resulting data entity shows how the data from the two inputs are subtracted during a minus by name operation. The resulting data entity uses the attribute name, order, and data type from the primary input data entity.
Data Entity 1, primary input
Department | LOCATION-ID | Warehouse |
---|---|---|
IT | 1400 | San Francisco |
Shipping | 1500 | Austin |
Accounting | 1700 | New Jersey |
Data Entity 2
Department | Warehouse | LOCATION-ID |
---|---|---|
Benefits | Denver | 1600 |
IT | San Francisco | 1400 |
Resulting Data Entity
Department | LOCATION-ID | Warehouse |
---|---|---|
Shipping | 1500 | Austin |
Accounting | 1700 | New Jersey |
Consider the following two data entity examples. Data entity 2 is set as the primary input. The resulting data entity shows how the data from the two inputs are subtracted during a minus by position operation. The resulting data entity uses the attribute name, order, and data type from the primary input data entity.
Data Entity 1
Department | LOCATION-ID | Warehouse |
---|---|---|
IT | 1400 | San Francisco |
Shipping | 1500 | Austin |
Accounting | 1700 | New Jersey |
Data Entity 2, primary input
Department-Name | Location | Warehouse-City |
---|---|---|
Benefits | 1600 | Denver |
IT | 1400 | San Francisco |
Resulting Data Entity
Department-Name | Location | Warehouse-City |
---|---|---|
Benefits | 1600 | Denver |
Intersect Operator
Use the intersect operator to compare two or more data entities and return the rows that are present in the connected entities.
You can choose to keep or eliminate duplicate rows in the resulting data.
You can perform an intersect operation on two or more source operators. You can choose to perform the operation by matching attribute names across the source input attributes, or you can match the attributes by their attribute position.
Consider the following two data entity examples. Data entity 1 is set as the primary input. The resulting data entity shows how the data from the two inputs are intersected by attribute name. The resulting data entity uses the attribute name, order, and data type from the primary input data entity.
Data Entity 1, primary input
Department | LOCATION-ID | Warehouse |
---|---|---|
IT | 1400 | San Francisco |
Shipping | 1500 | Austin |
Accounting | 1700 | New Jersey |
Data Entity 2
Department | Warehouse | LOCATION-ID |
---|---|---|
Benefits | Denver | 1600 |
IT | San Francisco | 1400 |
Resulting Data Entity
Department | LOCATION-ID | Warehouse |
---|---|---|
IT | 1400 | San Francisco |
Consider the following two data entity examples. Data entity 2 is set as the primary input. The resulting data entity shows how the data from the two inputs are intersected by attribute position. The resulting data entity uses the attribute name, order, and data type from the primary input data entity.
Data Entity 1
Department | LOCATION-ID | Warehouse |
---|---|---|
IT | 1400 | San Francisco |
Shipping | 1500 | Austin |
Accounting | 1700 | New Jersey |
Data Entity 2, primary input
Department-Name | Location | Warehouse-City |
---|---|---|
Benefits | 1600 | Denver |
IT | 1400 | San Francisco |
Resulting Data Entity
Department-Name | Location | Warehouse-City |
---|---|---|
IT | 1400 | San Francisco |
Split Operator
Use the split operator to divide one source of input data into two or more output ports based on split conditions that are evaluated in a sequence.
Each split condition has an output port. Data that satisfies a condition is directed to the corresponding output port.
By default a split operator is configured with the Unmatched condition, which is always available in the sequence as the last condition. You cannot add your own condition to the Unmatched condition. You also cannot delete the Unmatched condition.
The operator evaluates the conditions one at a time. After all conditions in the sequence have been evaluated, data that does not meet a condition is directed to the Unmatched output port.
Suppose you have the data entity BANK, with the attributes BANK_ID and BANK_NAME.
You set up two split conditions. The full sequence, including the Unmatched condition, is as follows:
Condition Output Port | Condition |
---|---|
CONDITION1 | SPLIT_1.BANK.BANK_ID<102 |
CONDITION2 | SPLIT_1.BANK.BANK_ID<104 |
UNMATCHED | The default UNMATCHED condition directs all data that does not meet the other conditions in the sequence to the UNMATCHED output port |
BANK Data Entity
The data entity has four rows.
BANK_ID | BANK_NAME |
---|---|
101 | A Bank 101 |
102 | B Bank 102 |
103 | C Bank 103 |
104 | D Bank 104 |
Condition1 Output, First Matching Condition
CONDITION1 returns one matching row.
BANK_ID | BANK_NAME |
---|---|
101 | A Bank 101 |
Condition2 Output, First Matching Condition
CONDITION2 returns two matching rows (from the unmatched rows after CONDITION1).
BANK_ID | BANK_NAME |
---|---|
102 | B Bank 102 |
103 | C Bank 103 |
Unmatched Condition Output, First Matching Condition
UNMATCHED condition returns the remaining row.
BANK_ID | BANK_NAME |
---|---|
104 | D Bank 104 |
Condition1 Output, All Matching Conditions
CONDITION1 returns one matching row.
BANK_ID | BANK_NAME |
---|---|
101 | A Bank 101 |
Condition2 Output, All Matching Conditions
All data is evaluated by CONDITION2, returning three matching rows.
BANK_ID | BANK_NAME |
---|---|
101 | A Bank 101 |
102 | B Bank 102 |
103 | C Bank 103 |
Unmatched Condition Output, All Matching Conditions
UNMATCHED condition returns the rows that do not meet CONDITION1 and CONDITION2.
BANK_ID | BANK_NAME |
---|---|
104 | D Bank 104 |
When you add a split operator on the canvas, by default the split operator icon is displayed as expanded, showing the Unmatched split condition. The Unmatched condition directs all data that does not meet the other conditions that you add to the sequence.
Use the Condition Builder to visually select elements to create and add a split condition. You can also enter a split condition manually in the editor.
You can add split conditions to an existing sequence of conditions. Split conditions are added to the end of the sequence, before the Unmatched condition. You cannot add your own condition to the Unmatched condition.
The elements that you can use in a split condition include incoming attributes, parameters, and functions. You can double-click or drag an element from the list to add to the editor to build a condition. You can validate the condition before creating it.
Incoming displays the attributes from the upstream port. For example:
SPLIT_1.BANK.BANK_NAME='ABC Bank'
Parameters are the expression parameters that have been added to the data flow using the Condition Builder (filter, join, lookup, and split operators) or Expression Builder (expression and aggregate operators). An expression parameter has a name, type, and default value. See Adding an Expression Parameter.
Suppose you create a VARCHAR
parameter with the name P_VARCHAR_NAME
, and set the default value to ABC BANK
. Then you can use the parameter in a split condition as:
SPLIT_1.BANK.BANK_NAME=$P_VARCHAR_NAME
Functions are the functions available in Data Integration that you can use in a condition. Functions are operations performed on arguments passed to the function. Functions calculate, manipulate, or extract data values from arguments.
Suppose you create a VARCHAR
parameter with the name P_VARCHAR_LIKE
, and set the default value to B%
. Then you can use the parameter in a split condition as:
SPLIT_1.BANK.BANK_NAME LIKE $P_VARCHAR_LIKE
Here's a list of functions that are available for you to add when you construct conditions:
Function | Description | Example |
---|---|---|
MD5(all data types) | Calculates an MD5 checksum of the data type, and returns a string value. | MD5(column_name) |
SHA1(all data types) | Calculates a SHA-1 hash value of the data type, and returns a string value. | SHA1(column_name) |
SHA2(all data types, bitLength) | Calculates a SHA-2 hash value of the data type, and returns a string value. bitLength is an integer. | SHA2 (column_name, bitLength can be set to 0 (equivalent to 256), 256, 384, or 512) . |
ORA_HASH(expr, [max_bucket], [seed_value]) |
Computes a hash value for
Oracle applies the hash function to the combination of |
|
Function | Description | Example |
---|---|---|
ABS(numeric) | Returns the absolute power of the numeric value. | ABS(-1) |
CEIL(numeric) | Returns the smallest integer not greater than the numeric value | CEIL(-1,2) |
FLOOR(numeric) | Returns the largest integer not greater than the numeric value. | FLOOR(-1,2) |
MOD(numeric1, numeric2) | Returns the remainder after numeric1 is divided by numeric2 . | MOD(8,2) |
POWER(numeric1, numeric2) | Raises numeric1 to the power of numeric2 . | POWER(2,3) |
ROUND(numeric1, numeric2) | Returns numeric1 rounded to numeric2 decimal places. | ROUND(2.5,0) |
TRUNC(numeric1, numeric2) | Returns numeric1 truncated to numeric2 decimal places. | TRUNC(2.5,0) |
TO_NUMBER(expr[, format, locale]) | Converts an expr to a number, based on the format and optional locale provided. Default locale is en-US . Supported language tags.Supported format patterns:
|
|
Function | Description | Example |
---|---|---|
CURRENT_DATE |
Returns the current date. | CURRENT_DATE returns today's date such as 2023-05-26 |
CURRENT_TIMESTAMP |
Returns the current date and time for the session time zone. | CURRENT_TIMESTAMP returns today's date and current time such as 2023-05-26 12:34:56
|
DATE_ADD(date, number_of_days) |
Returns the date that's the specified number of days after the specified date . |
DATE_ADD('2017-07-30', 1) returns 2017-07-31 |
DATE_FORMAT(expr, format[, locale]) |
Formats an Supported date format patterns:
|
|
DAYOFMONTH(date) |
Returns the date's day in the month. | DAYOFMONTH('2020-12-25') returns 25 |
DAYOFWEEK(date) |
Returns the date's day in the week. | DAYOFWEEK('2020-12-25') returns 6 for Friday. In the United States, Sunday is considered to be 1, Monday is 2, and so on. |
DAYOFYEAR(date) |
Returns the date's day in the year. | DAYOFYEAR('2020-12-25') returns 360 |
WEEKOFYEAR(date) |
Returns the date's week in the year. |
|
HOUR(datetime) |
Returns the datetime's hour value. | HOUR('2020-12-25 15:10:30') returns 15 |
LAST_DAY(date) |
Returns the date's last day of the month. | LAST_DAY('2020-12-25') returns 31 |
MINUTE(datetime) |
Returns the datetime's minute value. | HOUR('2020-12-25 15:10:30') returns 10 |
MONTH(date) |
Returns the date's month value. | MONTH('2020-06-25') returns 6 |
QUARTER(date) |
Returns the quarter of year the date is in. | QUARTER('2020-12-25') returns 4 |
SECOND(datetime) |
Returns the datetime's second value. | SECOND('2020-12-25 15:10:30') returns 30 |
TO_DATE(string, format_string[, localeStr]) |
Parses the string expression with the format_string expression to a date. Locale is optional. Default is en-US . Supported language tags.In pipeline expressions, the
|
|
TO_TIMESTAMP(expr, format_string[, localeStr]) |
Converts an expr of VARCHAR to a value of TIMESTAMP, based on the format_string and the optional localeStr provided.In pipeline expressions, the
|
TO_TIMESTAMP('2020-10-11 11:10:10', 'yyyy-MM-dd HH:mm:ss') returns a TIMESTAMP object representing 11am 10:10 Oct 11th, 2020 |
WEEK(date) |
Returns the date's week value. |
WEEK('2020-06-25') returns 4 |
YEAR(date) |
Returns the date's year value. | YEAR('2020-06-25') returns 2020 |
ADD_MONTHS(date_expr, number_months) |
Returns the date after adding the specified number of months to the specified date, timestamp or string with a format such as yyyy-MM-dd or yyyy-MM-dd HH:mm:ss.SSS . |
|
MONTHS_BETWEEN(start_date_expr, end_date_expr) |
Returns the number of months between A whole number is returned if both dates are the same day of the month, or both are the last day in their respective months. Otherwise, the difference is calculated based on 31 days per month. |
|
FROM_UTC_TIMESTAMP(time_stamp, time_zone) |
Interprets a date, timestamp or string as a UTC time and converts that time to a timestamp in the specified time zone. For string, use a format such as: Time zone format is either a region-based zone ID (for example, 'area/city' such as 'Asia/Seoul', or a time zone offset (for example, UTC+02). |
FROM_UTC_TIMESTAMP('2017-07-14 02:40:00.0', 'GMT+1') returns 2017-07-14 03:40:00.0 |
TO_UTC_TIMESTAMP(time_stamp, time_zone) |
Converts a date, timestamp or string in the specified time zone to a UTC timestamp. For string, use a format such as: Time zone format is either a region-based zone ID (for example, 'area/city' such as 'Asia/Seoul'), or a time zone offset (for example, UTC+02). |
TO_UTC_TIMESTAMP('2017-07-14 02:40:00.0', 'GMT+1') returns 2017-07-14 01:40:00.0
|
FROM_UNIXTIME(unix_time[, fmt]) |
Converts the specified Unix time or epoch to a string that represents the timestamp of that moment in the current system time zone and in the specified format. Note: Unix time is the number of seconds that have elapsed since January 1, 1970 00:00:00 UTC. If |
Default time zone is PST in the examples |
UNIX_TIMESTAMP([time_expr[, fmt]]) |
Converts the current or specified time to a Unix timestamp in seconds.
If If |
Default time zone is PST in this example |
INTERVAL 'year' YEAR[(year_precision)] |
Returns a period of time in years. year_precision is the number of digits in the year field; it ranges from 0 to 9. If year_precision is omitted, the default is 2 (must be less than 100 years.) |
|
INTERVAL 'year month' YEAR[(year_precision)] TO MONTH |
Returns a period of time in years and months. Use to store a period of time using year and month fields. year_precision is the number of digits in the year field; it ranges from 0 to 9. If year_precision is omitted, the default is 2 (must be less than 100 years.) |
INTERVAL '100-5' YEAR(3) TO MONTH returns an interval of 100 years, 5 months. Must must specify the leading year precision of 3. |
INTERVAL 'month' MONTH[(month_precision)] |
Returns a period of time in months. month_precision is the number of digits in the month field; it ranges from 0 to 9. If month_precision is omitted, the default is 2 (must be less than 100 years.) |
INTERVAL '200' MONTH(3) returns an interval of 200 months. Must specify the month precision of 3. |
INTERVAL 'day time' DAY[(day_precision)] TO SECOND[(fractional_seconds_precision)] |
Returns a period of time in terms of days, hours, minutes, and seconds. day_precision is the number of digits in the day field; it ranges from 0 to 9. The default is 2. fractional_seconds_precision is the number of digits in the fractional part of the second value in the time field; it ranges from 0 to 9. |
INTERVAL '11 10:09:08.555' DAY TO SECOND(3) returns an interval of 11 days, 10 hours, 09 minutes, 08 seconds, and 555 thousandths of a second |
INTERVAL 'day time' DAY[(day_precision)] TO MINUTE[(minute_precision)] |
Returns a period of time in terms of days, hours, and minutes. day_precision is the number of digits in the day field; it ranges from 0 to 9. The default is 2. minute_precision is the number of digits in the minute field; it ranges from 0 to 2. The default is 2. |
INTERVAL '11 10:09' DAY TO MINUTE returns an interval of 11 days, 10 hours, and 09 minutes |
INTERVAL 'day time' DAY[(day_precision)] TO HOUR[(hour_precision)] |
Returns a period of time in terms of days and hours. day_precision is the number of digits in the day field; it ranges from 0 to 9. The default is 2. hour_precision is the number of digits in the hour field; it ranges from 0 to 2. The default is 2. |
INTERVAL '100 10' DAY(3) TO HOUR returns an interval of 100 days 10 hours |
INTERVAL 'day' DAY[(day_precision)] |
Returns a period of time in terms of days. day_precision is the number of digits in the day field; it ranges from 0 to 9. The default is 2. |
INTERVAL '999' DAY(3) returns an interval of 999 days |
INTERVAL 'time' HOUR[(hour_precision)] TO SECOND[(fractional_seconds_precision)] |
Returns a period of time in terms of hours, minutes, and seconds. hour_precision is the number of digits in the hour field; it ranges from 0 to 2. The default is 2. fractional_seconds_precision is the number of digits in the fractional part of the second value in the time field; it ranges from 0 to 9. |
INTERVAL '09:08:07.6666666' HOUR TO SECOND(7) returns an interval of 9 hours, 08 minutes, and 7.6666666 seconds |
INTERVAL 'time' HOUR[(hour_precision)] TO MINUTE[(minute_precision)] |
Returns a period of time in terms of hours and minutes. hour_precision is the number of digits in the hour field; it ranges from 0 to 2. The default is 2. minute_precision is the number of digits in the minute field; it ranges from 0 to 2. The default is 2. |
INTERVAL '09:30' HOUR TO MINUTE returns an interval of 9 hours and 30 minutes |
INTERVAL 'hour' HOUR[(hour_precision)] |
Returns a period of time in terms of hours. hour_precision is the number of digits in the hour field; it ranges from 0 to 2. The default is 2. |
INTERVAL '40' HOUR returns an interval of 40 hours |
INTERVAL 'minute' MINUTE[(minute_precision)] |
Returns a period of time in terms of minutes. minute_precision is the number of digits in the minute field; it ranges from 0 to 2. The default is 2. |
INTERVAL '15' MINUTE returns an interval of 15 minutes |
INTERVAL 'time' MINUTE[(minute_precision)] TO SECOND[(fractional_seconds_precision)] |
Returns a period of time in terms of minutes and seconds. minute_precision is the number of digits in the minute field; it ranges from 0 to 2. The default is 2. fractional_seconds_precision is the number of digits in the fractional part of the second value in the time field; it ranges from 0 to 9. |
INTERVAL '15:30' MINUTE TO SECOND returns an interval of 15 minutes and 30 seconds |
INTERVAL 'second' SECOND[(fractional_seconds_precision)] |
Returns a period of time in terms of seconds. fractional_seconds_precision is the number of digits in the fractional part of the second field; it ranges from 0 to 9. The default is 3. |
INTERVAL '15.678' SECOND returns an interval of 15.678 seconds |
Function | Description | Example |
---|---|---|
FIRST_VALUE(value) OVER ([ partition_clause ] order_by_clause [ windowFrame_clause ] ) |
Returns the value evaluated at the row that's the first row of the window frame. | FIRST_VALUE(BANK_ID) OVER (PARTITION BY BANK_ID ORDER BY BANK_NAME ROWS BETWEEN CURRENT ROW AND 1 FOLLOWING) returns the first value of BANK_ID in a window over which the rows are computed as the current row and 1 row after that row, partitioned by BANK_ID and in ascending order of BANK_NAME . |
LAG(value[, offset[, default]]) OVER ([ partition_clause ] order_by_clause) |
Returns the value evaluated at the row at a given offset before the current row within the partition. If there is no such row, the default value is returned. Both offset and default are evaluated with respect to the current row. If omitted, offset is defaulted to 1 and default to NULL. | LAG(BANK_ID, 2, 'hello') OVER (PARTITION BY BANK_ID ORDER BY BANK_NAME DESC) returns the value of BANK_ID from the second row before the current row, partitioned by BANK_ID and in descending order of BANK_NAME . If there is no such value, hello is returned. |
LAST_VALUE(value) OVER ([ partition_clause ] order_by_clause [ windowFrame_clause ]) |
Returns the value evaluated at the row that is the last row of the window frame. | LAST_VALUE(BANK_ID) OVER (PARTITION BY BANK_ID ORDER BY BANK_NAME ROWS BETWEEN CURRENT ROW AND 1 FOLLOWING) returns the last value of BANK_ID in a window over which the rows are computed as the current row and 1 row after that row, partitioned by BANK_ID and in ascending order of BANK_NAME . |
LEAD(value[, offset[, default]]) OVER ([ partition_clause ] order_by_clause) |
Returns the value evaluated at the row at a given offset after the current row within the partition. If there is no such row, the default value is returned. Both offset and default are evaluated with respect to the current row. If omitted, offset is defaulted to 1 and default to NULL. | LEAD(BANK_ID, 2, 'hello') OVER (PARTITION BY BANK_ID ORDER BY BANK_NAME) returns the value of BANK_ID from the second row after the current row, partitioned by BANK_ID and in ascending order of BANK_NAME . If there is no such value, hello is returned. |
RANK() OVER([ partition_clause ] order_by_clause) |
Returns the rank of the current row with gaps, counting from 1. | RANK() OVER (PARTITION BY BANK_ID ORDER BY BANK_NAME) returns the rank of each row within the partition group of BANK_ID , in ascending order of BANK_NAME . |
ROW_NUMBER() OVER([ partition_clause ] order_by_clause) |
Returns the unique number of the current row within its partition, counting from 1. | ROW_NUMBER() OVER (PARTITION BY BANK_ID ORDER BY BANK_NAME) returns the unique row number of each row within the partition group of BANK_ID , in ascending order of BANK_NAME . |
Functions | Description | Example |
---|---|---|
CAST(value AS type) | Returns the specified value in the specified type. | CAST("10" AS INT) returns 10 |
CONCAT(string, string) | Returns the combined values of strings or columns. | CONCAT('Oracle','SQL') returns OracleSQL |
CONCAT_WS(separator, expression1, expression2, expression3,...) |
Returns the combined values of strings or columns using the specified separator in between the strings or columns. A separator is required and it must be a string. At least one expression must be provided after the separator. For example: |
CONCAT_WS('-', 'Hello', 'Oracle') returns Hello-Oracle
If a child of the function is an array, then the array is flattened:
|
INITCAP(string) | Returns the string with the first letter in each word capitalized, while all other letters are in lowercase, and each word is delimited by a white space. | INITCAP('oRACLE sql') returns Oracle Sql |
INSTR(string, substring[start_position]) | Returns the (1-based) index of the first occurrence of substring in string . | INSTR('OracleSQL', 'SQL') returns 7 |
LOWER(string) | Returns the string with all letters changed to lowercase. | LOWER('ORACLE') returns oracle |
LENGTH(string) | Returns the character length of string or the number of bytes of binary data. The length of the string includes trailing spaces. | LENGTH('Oracle') returns 6 |
LTRIM(string) | Returns the string with leading spaces removed from the left. | LTRIM(' Oracle') |
NVL(expr1, epxr2) | Returns the argument that is not null. | NVL(EXPRESSION_3.CUSTOMERS_JSON.CONTINENT_ID, ROWID()) |
REGEXP_SUBSTR(string, regexp[, RegexGroupIdx]) | Searches and extracts the string that matches a regular expression pattern from the input string. If the optional capturing group index is provided, the function extracts the specific group. |
REGEXP_SUBSTR('22;33;44', '([0-9.]*);([0-9.]*);([0-9.]*)', 1) returns 22 |
REPLACE(string, search, replacement) | Replaces all occurrences of search with replacement .If If | REPLACE('ABCabc', 'abc', 'DEF') returns ABCDEF |
RTRIM(string) | Returns the string with leading spaces removed from the right. | RTRIM('Oracle ') |
SUBSTRING(string, position[, substring_length]) | Returns the substring starting at position. | SUBSTRING('Oracle SQL' FROM 2 FOR 3) returns rac |
For numbers, TO_CHAR(expr) and for dates TO_CHAR(expr, format[, locale]) | Converts numbers and dates into strings. For numbers, no format is required. For dates, use the same format as DATE_FORMAT described in Date and Time Functions. Default locale is en-US . See supported language tags.In pipeline expressions, the
| Number example: Date example: |
UPPER(string) | Returns a string with all letters changed to uppercase. | UPPER('oracle') returns ORACLE |
LPAD(str, len[, pad]) | Returns a string that is left-padded with specified characters to a certain length. If the pad character is omitted, the default is a space. | LPAD('ABC', 5, '*') returns '**ABC' |
RPAD(str, len[, pad]) | Returns a string that is right-padded with specified characters to a certain length. If the pad character is omitted, the default is a space. | RPAD('XYZ', 6, '+' ) returns 'XYZ+++' |
Function | Description | Example |
---|---|---|
CASE WHEN condition1 THEN result1 ELSE result2 END | Returns the value for which a condition is met. | CASE WHEN 1 > 0 THEN 'ABC' ELSE 'XYZ' END returns ABC if 1> 0 , otherwise returns XYZ |
AND | The logical AND operator. Returns true if both operands are true, otherwise returns false. | (x = 10 AND y = 20) returns "true" if x is equal to 10 and y is equal to 20. If either one is not true, then it returns "false" |
OR | The logical OR operator. Returns true if either operand is true or both are true, otherwise returns false. | (x = 10 OR y = 20) returns "false" if x is not equal to 10 and also y is not equal to 20. If either one is true, then it returns "true" |
NOT | The logical NOT operator. | |
LIKE | Performs string pattern matching, whether string1 matches the pattern in string2. | |
= | Tests for equality. Returns true if expr1 equals expr2, otherwise returns false. | x = 10 returns "true" when value of x is 10, else it returns "false" |
!= | Tests for inequality. Returns true if expr1 does not equal to expr2, otherwise returns false. | x != 10 returns "false" if value of x is 10, else it returns "true" |
> | Tests for an expression greater than. Returns true if expr1 is greater than expr2. | x > 10 returns "true" if value of x is greater than 10, else it returns "false" |
>= | Tests for an expression greater than or equal to. Returns true if expr1 is greater than or equal to expr2. | x > =10 returns "true" if value of x is greater than or equal to 10, else it returns "false" |
< | Tests for an expression less than. Returns true if expr1 is less than expr2. | x < 10 returns "true" if value of x is less than 10, else it returns "false" |
<= | Tests for an expression less than or equal to. Returns true if expr1 is less than or equal to expr2. | x <= 10 returns "true" if value of x is less than 10, else it returns "false" |
|| | Concatenates two strings. | 'XYZ' || 'hello' returns 'XYZhello' |
BETWEEN | Evaluates a range. | FILTER_1.BANK.BANK_ID BETWEEN 1003 AND 1007 |
IN | Tests whether an expression matches a list of values. | FILTER_2.ORDERS.ORDER_ID IN (1003, 1007) |
Function | Description | Example |
---|---|---|
NUMERIC_ID() | Generates a universally unique identifier that is a 64-bit number for each row. | NUMERIC_ID() returns for example, 3458761969522180096 and 3458762008176885761 |
ROWID() | Generates monotonically increasing 64-bit numbers. | ROWID() returns for example, 0 , 1 , 2 , and so on |
UUID() | Generates a universally unique identifier that is a 128-bit String for each row. | UUID() returns for example, 20d45c2f-0d56-4356-8910-162f4f40fb6d |
MONOTONICALLY_INCREASING_ID() |
Generates unique, monotonically increasing 64-bit integers that are not consecutive numbers. | MONOTONICALLY_INCREASING_ID() returns for example,
8589934592 and 25769803776 |
Function | Description | Example |
---|---|---|
COALESCE(value, value [, value]*) | Returns the first non-null argument, if it exists, otherwise returns null. | COALESCE(NULL, 1, NULL) returns 1 |
NULLIF(value, value) | Returns null if the two values equal each other, otherwise returns the first value. | NULLIF('ABC','XYZ') returns ABC |
Function | Description | Example |
---|---|---|
SCHEMA_OF_JSON(string) | Parses a JSON string and infers the schema in DDL format. |
|
FROM_JSON(column, string) | Parses a column containing a JSON string into one of the following types, with the specified schema.
|
|
TO_JSON(column) | Converts a column containing a type of Struct or Array of Structs, or a Map or Array of Map, into a JSON string. | TO_JSON(TO_STRUCT('s1', TO_ARRAY(1,2,3), 's2', TO_MAP('key', 'value'))) returns a JSON string {"s1":[1,2,3],"s2":{"key":"value"}} |
TO_MAP(string,column[,string,column]*) | Creates a new column of Map type. The input columns must be grouped as key-value pairs. The input key columns cannot be null, and must all have the same data type. The input value columns must all have the same data type. |
|
TO_STRUCT(string,column[,string,column]*) | Creates a new column of Struct type. The input columns must be grouped as key-value pairs. |
|
TO_ARRAY(column[,column]*) | Creates a new column as Array type. The input columns must all have the same data type. |
|
Data flow operators that support creating expressions and hierarchical data types can use higher-order functions.
The supported operators are:
-
Aggregate
-
Expression
-
Filter
-
Join
-
Lookup
-
Split
-
Pivot
Function | Description | Example |
---|---|---|
TRANSFORM(column, lambda_function) | Takes an array and an anonymous function, and sets up a new array by applying the function to each element, and assigning the result to the output array. | For an input array of integers [1, 2, 3] , TRANSFORM(array, x -> x + 1) returns a new array of [2, 3, 4] . |
TRANSFORM_KEYS(column, lambda_function) | Takes a map and a function with 2 arguments (key and value), and returns a map in which the keys have the type of the result of the lambda function, and the values have the type of the column map values. | For an input map with integer keys and string values of {1 -> 'value1', 2 -> 'value2', 3 -> 'value3'} , TRANSFORM_KEYS(map, (k, v) -> k * 2 + 1) returns a new map of {3 -> 'value1', 5 -> 'value2', 7 -> 'value3'} . |
TRANSFORM_VALUES(column, lambda_function) | Takes a map and a function with 2 arguments (key and value), and returns a map in which the values have the type of the result of the lambda functions, and the keys have the type of the column map keys. | For an input map with string keys and string values of {'a' -> 'value1', 'b' -> 'value2', 'c' -> 'value3'} , TRANSFORM_VALUES(map, (k, v) -> k || __ '' || v) returns a new map of {'a' -> 'a_value1', 'b' -> 'b_value2', 'c' -> 'c_value3'} . |
ARRAY_SORT(array(...), lambda_function) |
Only the Expression operator supports Takes an array and sorts according to the given function that takes 2 arguments. The function must return -1, 0, or 1 depending on whether the first element is less than, equal to, or greater than the second element. If the function is omitted, the array is sorted in ascending order. |
The returned array is: [1,5,6] |
You can edit any split condition except the Unmatched condition.
You can move a split condition up or down in the sequence. Only the Unmatched condition cannot be moved.
You can delete any split condition except the Unmatched condition.
Pivot Operator
The pivot operator lets you take unique row values from one attribute in an input source and pivot the values into multiple attributes in the output.
Using input from multiple rows, the pivot operation performs a transformation based on aggregate function expressions, and the values of an attribute that you specify as the pivot key. The result of a pivot operation is a pivoted or rearranged output of rows and attributes.
The number of rows in the output is based on the selection of attributes to group by.
- If you specify one or more attributes to group by, the incoming rows that have the same group by attribute value is grouped into one row. For example, if you specify one group by attribute that has four unique values, the incoming data is transformed and grouped into four rows in the output.
- If you don't specify any attributes to group by, all the incoming data is transformed into a single output row.
The number of attributes in the output is:
- Based on the number of attributes you select to group by
- A multiple of the number of values you select in the pivot key
- A result of the number of attributes that the aggregate function expressions transform
For example, if you select one group by attribute and three pivot key values, and add an expression that transforms two attributes, the number of attributes in the output is:
1 + (3 * 2)
The total number of attributes in the resulting pivoted output is calculated as follows:
Number of group by attributes + (Number of pivot key values * Number of attributes that are transformed from expressions)
The names of new attributes in the output are derived from a pattern you add for target attributes when you create the expressions.
The aggregate functions you use with a pivot operator determine the pivoted values in the output. If no data is found, null is inserted where a pivoted value is expected.
Consider the data entity PRODUCT_SALES, that has the attributes STORE, PRODUCT, and SALES. The data entity has five rows. You want to create a pivot on the attribute PRODUCT with an aggregate SUM function on SALES.
You specify to group pivoted rows by the attribute STORE. Each unique STORE value becomes a row in the resulting output. Input rows with the same group by value are grouped into the same row in the pivoted output. If you don't specify a group by attribute, all input rows are transformed into a single row in the resulting output.
You specify PRODUCT as the pivot key, and select all three values to pivot into new attributes in the resulting output.
The aggregate SUM function expression on SALES is:
SUM(PIVOT_1_1.PRODUCT_SALES.SALES)
The pattern for target attributes is:
%PIVOT_KEY_VALUE%
Data Entity PRODUCT_SALES
STORE | PRODUCT | SALES |
---|---|---|
AB Store | Television | 2 |
AB Store | Television | 4 |
Country-Store | Television | 6 |
Country-Store | Refrigerator | 8 |
E-Store | Coffee maker | 10 |
Pivot Output: Group By STORE, Pivot Key PRODUCT
STORE | TELEVISION | REFRIGERATOR | COFFEE MAKER |
---|---|---|---|
AB-Store | 6 | - | - |
Country-Store | 6 | 8 | - |
E-Store | - | - | 10 |
Pivot Output: Without Group By, Pivot Key PRODUCT
TELEVISION | REFRIGERATOR | COFFEE MAKER |
---|---|---|
12 | 8 | 10 |
The pivot operator performs a transformation using one or more aggregate function expressions on one or more values of an attribute that you specify as the pivot key.
You can choose to group the pivoted rows into a single row or select attributes to create multiple output rows based on the same group by value.
One or more aggregate function expressions are required for a pivot operator.
With the pivot operator selected on the data flow canvas, under Expressions in the properties panel, click Add expression.
In the Add Expression panel, enter a name for the expression in the Identifier field, or leave the name as-is.
(Optional) To use a pattern to apply an expression to multiple source attributes, select Allow bulk selection.
For example, suppose you have two DISCOUNT_ attributes in the dataset (DISCOUNT_VALUE and DISCOUNT_RATE) and you want to apply the
MAX
function to both of them.Under Source attributes, select Pattern and click Add pattern.
In the Add Source Pattern panel, add a pattern for selecting a group of source attributes that have names beginning with DISCOUNT_. For example, enter
DISCOUNT*
and click Add. Then select the Data type.Under Target attributes, use a pattern for the resulting output attribute names.
By default, the pattern
%MACRO_INPUT%_%PIVOT_KEY_VALUE%
is already inserted for you.%MACRO_INPUT%
corresponds to the names of the sources attributes as selected by the pattern you added.%PIVOT_KEY_VALUE%
corresponds to the values as selected in the pivot key.For example, if
%PIVOT_KEY_VALUE%
indicates TELEVISION, and%MACRO_INPUT%
indicates DISCOUNT_VALUE and DISCOUNT_RATE, then in the output, the pivoted attributes are<pivot_name>.<expression_name>.DISCOUNT_VALUE_TELEVISION
and<pivot_name>.<expression_name>.DISCOUNT_RATE_TELEVISION
.- You can retain the Use source attribute data types selection. Otherwise, under Expression data type, select the Data type and complete the fields corresponding to the type selected.
If you did not select Allow bulk selection, under Target attributes, use a pattern for the resulting output attribute names.
By default, the pattern
%PIVOT_KEY_VALUE%
is already inserted for you.%PIVOT_KEY_VALUE%
corresponds to the values as selected in the pivot key.For example, if
%PIVOT_KEY_VALUE%
indicates TELEVISION and TELEPHONE, then in the output, the pivoted attributes are<pivot_name>.<expression_name>.TELEVISION
and<pivot_name>.<expression_name>.TELEPHONE
.Under Expression data type, select the Data type and complete the fields corresponding to the type selected.
In the Expression builder section, double-click or drag incoming attributes, parameters, and aggregate functions to add to the editor to build the expression. You can also manually write the expression yourself, and validate the expression.
The following table shows the list of aggregate functions that are available for constructing pivot expressions.
Function Description Example COUNT(value[, value]*)
Returns the number of rows for which one or more supplied expressions are all non-null. COUNT(expr1)
COUNT(*)
Returns the total number of retrieved rows, including rows containing null. COUNT(*)
MAX(value)
Returns the maximum value of the argument. MAX(expr)
MIN(value)
Returns the minimum value of the argument. MIN(expr)
SUM(numeric)
Returns the sum calculated from values of a group. SUM(expr1)
You can also use higher-order (transform) functions in a pivot expression.
To construct a pivot expression, specify the attribute or attributes and the aggregate function.
If you selected Allow bulk selection, then use
%MACRO_INPUT%
in the expression to indicate the attributes on which the function is to be applied.For example, if you used the pattern
DISCOUNT*
to match source attributesDISCOUNT_RATE
andDISCOUNT_VALUE
, then you can specify an aggregate function such asSUM(numeric)
to apply the function on all the attributes that match the pattern.%MACRO_INPUT%
replaces thenumeric
placeholder in the function:SUM(%MACRO_INPUT%)
If you did not select Allow bulk selection, then specify the attribute in the function.
For example, the data entity is PRODUCT_SALES and you want to use an aggregate SUM function on the SALES attribute. You can specify the function such as
SUM(numeric)
, replacing thenumeric
placeholder in the function with the attribute name:SUM(PIVOT_1_1.PRODUCT_SALES.SALES)
You can use an expression parameter for the aggregate function name in the pivot expression. An expression parameter has a name, type, and default value.
For example, the expression parameter
P_VARCHAR
has typeVARCHAR
andMIN
as the default value. You can specify the aggregate function as:$P_VARCHAR(%MACRO_INPUT%)
$P_VARCHAR(PIVOT_1_1.PRODUCT_SALES.SALES)
In the Add Expression panel, click Add.
Lookup Operator
The lookup operator performs a query and transformation using a lookup condition and input from two sources, a primary input source and a lookup input source.
The query operation uses the condition and a value in the primary input to find rows in the lookup source. The transformation appends attributes from the lookup source to the primary source.
You can specify the action to take when multiple rows and no rows are returned by the lookup query. For example, you can specify that the action is to skip non-matching rows, and to return any one matching row when there are multiple matching rows.
The resulting output is a combination of both input sources based on the lookup condition, a value in the primary input, and the preferred actions to take. The primary input determines the order of attributes and rows in the output, with the attributes from the primary input placed before the attributes from the lookup input.
Consider two source data entities in a data flow. Data entity 1 (PAYMENTS) is set as the primary input. Data entity 2 (CUSTOMERS) is set as the lookup input. The lookup condition is set as:
LOOKUP_1_1.PAYMENTS.CUSTOMER_ID = LOOKUP_1_2.CUSTOMERS.CUSTOMER_ID
The resulting lookup output shows how the data from the two inputs are combined and transformed. Attributes from the lookup source are appended to the primary source attributes, with the following behaviors:
- If the operation does not find a matching record for a value in the lookup source, the record is returned with null inserted for the lookup attributes. For example, no matched records are found for the CUSTOMER_ID values 103, 104, and 105. Therefore, in the resulting output, null is populated in the appended attributes CUSTOMER_ID and NAME.
- If the operation finds multiple matching records for a value in the lookup source, any one matching record is returned.
Data Entity 1, primary input source
PAYMENT_ID | CUSTOMER_ID | AMOUNT |
---|---|---|
1 | 101 | 2500 |
2 | 102 | 1110 |
3 | 103 | 500 |
4 | 104 | 400 |
5 | 105 | 150 |
6 | 102 | 450 |
Data Entity 2, lookup input source
CUSTOMER_ID | NAME |
---|---|
101 | Peter |
102 | Paul |
106 | Mary |
102 | Pauline |
Lookup Output
PAYMENT_ID | CUSTOMER_ID | AMOUNT | CUSTOMER_ID | NAME |
---|---|---|---|---|
1 | 101 | 2500 | 101 | Peter |
2 | 102 | 1110 | 102 | Paul |
3 | 103 | 500 | null | null |
4 | 104 | 400 | null | null |
5 | 105 | 150 | null | null |
6 | 102 | 450 | 102 | Paul |
A lookup operator uses two input sources on a data flow.
The following procedure assumes you have added and configured two source operators.
Use the Condition Builder to visually select elements to create a lookup condition. You can also enter a condition manually in the editor.
A lookup condition lets you use a value in a primary input source to look up records in a lookup input source, returning any rows that match. Rows that do not have a match are returned with null values.
The elements that you can use in a lookup condition include incoming attributes, parameters, and functions. You can double-click or drag an element from the list to add to the editor to build a condition. You can validate the condition before creating it.
Incoming displays the attributes from the upstream input ports in two separate LOOKUP folders. View the attributes from each port by expanding or collapsing the appropriate LOOKUP folder. For example, LOOKUP_1_1 is the primary input, LOOKUP_1_2 is the lookup input, lookup conditions based on a value in the primary input attribute ADDRESS_ID could be:
LOOKUP_1_1.BANK_ADDRESS.ADDRESS_ID = '2001'
LOOKUP_1_1.BANK_ADDRESS.ADDRESS_ID = LOOKUP_1_2.BANK_CUSTOMER.ADDRESS_ID
Parameters are the expression parameters that have been added to the data flow using the Condition Builder (filter, join, lookup, and split operators) or Expression Builder (expression and aggregate operators). An expression parameter has a name, type, and default value. See Adding an Expression Parameter.
Suppose you want to look up bank customers of a specific bank. You can create a VARCHAR
parameter with the name P_LOOK_UP
, and set the default value to 2001
, which is the specific bank value. Then you can create the lookup condition as:
LOOKUP_1_1.BANK_ADDRESS.ADDRESS_ID = LOOKUP_1_2.BANK_CUSTOMER.ADDRESS_ID AND LOOKUP_1_1.BANK_ADDRESS.ADDRESS_ID = $P_LOOK_UP
Functions are the functions available in Data Integration that you can use in a condition. Functions are operations performed on arguments passed to the function. Functions calculate, manipulate, or extract data values from arguments.
Here's a list of functions that are available for you to add when you construct conditions:
Function | Description | Example |
---|---|---|
MD5(all data types) | Calculates an MD5 checksum of the data type, and returns a string value. | MD5(column_name) |
SHA1(all data types) | Calculates a SHA-1 hash value of the data type, and returns a string value. | SHA1(column_name) |
SHA2(all data types, bitLength) | Calculates a SHA-2 hash value of the data type, and returns a string value. bitLength is an integer. | SHA2 (column_name, bitLength can be set to 0 (equivalent to 256), 256, 384, or 512) . |
ORA_HASH(expr, [max_bucket], [seed_value]) |
Computes a hash value for
Oracle applies the hash function to the combination of |
|
Function | Description | Example |
---|---|---|
ABS(numeric) | Returns the absolute power of the numeric value. | ABS(-1) |
CEIL(numeric) | Returns the smallest integer not greater than the numeric value | CEIL(-1,2) |
FLOOR(numeric) | Returns the largest integer not greater than the numeric value. | FLOOR(-1,2) |
MOD(numeric1, numeric2) | Returns the remainder after numeric1 is divided by numeric2 . | MOD(8,2) |
POWER(numeric1, numeric2) | Raises numeric1 to the power of numeric2 . | POWER(2,3) |
ROUND(numeric1, numeric2) | Returns numeric1 rounded to numeric2 decimal places. | ROUND(2.5,0) |
TRUNC(numeric1, numeric2) | Returns numeric1 truncated to numeric2 decimal places. | TRUNC(2.5,0) |
TO_NUMBER(expr[, format, locale]) | Converts an expr to a number, based on the format and optional locale provided. Default locale is en-US . Supported language tags.Supported format patterns:
|
|
Function | Description | Example |
---|---|---|
CURRENT_DATE |
Returns the current date. | CURRENT_DATE returns today's date such as 2023-05-26 |
CURRENT_TIMESTAMP |
Returns the current date and time for the session time zone. | CURRENT_TIMESTAMP returns today's date and current time such as 2023-05-26 12:34:56
|
DATE_ADD(date, number_of_days) |
Returns the date that's the specified number of days after the specified date . |
DATE_ADD('2017-07-30', 1) returns 2017-07-31 |
DATE_FORMAT(expr, format[, locale]) |
Formats an Supported date format patterns:
|
|
DAYOFMONTH(date) |
Returns the date's day in the month. | DAYOFMONTH('2020-12-25') returns 25 |
DAYOFWEEK(date) |
Returns the date's day in the week. | DAYOFWEEK('2020-12-25') returns 6 for Friday. In the United States, Sunday is considered to be 1, Monday is 2, and so on. |
DAYOFYEAR(date) |
Returns the date's day in the year. | DAYOFYEAR('2020-12-25') returns 360 |
WEEKOFYEAR(date) |
Returns the date's week in the year. |
|
HOUR(datetime) |
Returns the datetime's hour value. | HOUR('2020-12-25 15:10:30') returns 15 |
LAST_DAY(date) |
Returns the date's last day of the month. | LAST_DAY('2020-12-25') returns 31 |
MINUTE(datetime) |
Returns the datetime's minute value. | HOUR('2020-12-25 15:10:30') returns 10 |
MONTH(date) |
Returns the date's month value. | MONTH('2020-06-25') returns 6 |
QUARTER(date) |
Returns the quarter of year the date is in. | QUARTER('2020-12-25') returns 4 |
SECOND(datetime) |
Returns the datetime's second value. | SECOND('2020-12-25 15:10:30') returns 30 |
TO_DATE(string, format_string[, localeStr]) |
Parses the string expression with the format_string expression to a date. Locale is optional. Default is en-US . Supported language tags.In pipeline expressions, the
|
|
TO_TIMESTAMP(expr, format_string[, localeStr]) |
Converts an expr of VARCHAR to a value of TIMESTAMP, based on the format_string and the optional localeStr provided.In pipeline expressions, the
|
TO_TIMESTAMP('2020-10-11 11:10:10', 'yyyy-MM-dd HH:mm:ss') returns a TIMESTAMP object representing 11am 10:10 Oct 11th, 2020 |
WEEK(date) |
Returns the date's week value. |
WEEK('2020-06-25') returns 4 |
YEAR(date) |
Returns the date's year value. | YEAR('2020-06-25') returns 2020 |
ADD_MONTHS(date_expr, number_months) |
Returns the date after adding the specified number of months to the specified date, timestamp or string with a format such as yyyy-MM-dd or yyyy-MM-dd HH:mm:ss.SSS . |
|
MONTHS_BETWEEN(start_date_expr, end_date_expr) |
Returns the number of months between A whole number is returned if both dates are the same day of the month, or both are the last day in their respective months. Otherwise, the difference is calculated based on 31 days per month. |
|
FROM_UTC_TIMESTAMP(time_stamp, time_zone) |
Interprets a date, timestamp or string as a UTC time and converts that time to a timestamp in the specified time zone. For string, use a format such as: Time zone format is either a region-based zone ID (for example, 'area/city' such as 'Asia/Seoul', or a time zone offset (for example, UTC+02). |
FROM_UTC_TIMESTAMP('2017-07-14 02:40:00.0', 'GMT+1') returns 2017-07-14 03:40:00.0 |
TO_UTC_TIMESTAMP(time_stamp, time_zone) |
Converts a date, timestamp or string in the specified time zone to a UTC timestamp. For string, use a format such as: Time zone format is either a region-based zone ID (for example, 'area/city' such as 'Asia/Seoul'), or a time zone offset (for example, UTC+02). |
TO_UTC_TIMESTAMP('2017-07-14 02:40:00.0', 'GMT+1') returns 2017-07-14 01:40:00.0
|
FROM_UNIXTIME(unix_time[, fmt]) |
Converts the specified Unix time or epoch to a string that represents the timestamp of that moment in the current system time zone and in the specified format. Note: Unix time is the number of seconds that have elapsed since January 1, 1970 00:00:00 UTC. If |
Default time zone is PST in the examples |
UNIX_TIMESTAMP([time_expr[, fmt]]) |
Converts the current or specified time to a Unix timestamp in seconds.
If If |
Default time zone is PST in this example |
INTERVAL 'year' YEAR[(year_precision)] |
Returns a period of time in years. year_precision is the number of digits in the year field; it ranges from 0 to 9. If year_precision is omitted, the default is 2 (must be less than 100 years.) |
|
INTERVAL 'year month' YEAR[(year_precision)] TO MONTH |
Returns a period of time in years and months. Use to store a period of time using year and month fields. year_precision is the number of digits in the year field; it ranges from 0 to 9. If year_precision is omitted, the default is 2 (must be less than 100 years.) |
INTERVAL '100-5' YEAR(3) TO MONTH returns an interval of 100 years, 5 months. Must must specify the leading year precision of 3. |
INTERVAL 'month' MONTH[(month_precision)] |
Returns a period of time in months. month_precision is the number of digits in the month field; it ranges from 0 to 9. If month_precision is omitted, the default is 2 (must be less than 100 years.) |
INTERVAL '200' MONTH(3) returns an interval of 200 months. Must specify the month precision of 3. |
INTERVAL 'day time' DAY[(day_precision)] TO SECOND[(fractional_seconds_precision)] |
Returns a period of time in terms of days, hours, minutes, and seconds. day_precision is the number of digits in the day field; it ranges from 0 to 9. The default is 2. fractional_seconds_precision is the number of digits in the fractional part of the second value in the time field; it ranges from 0 to 9. |
INTERVAL '11 10:09:08.555' DAY TO SECOND(3) returns an interval of 11 days, 10 hours, 09 minutes, 08 seconds, and 555 thousandths of a second |
INTERVAL 'day time' DAY[(day_precision)] TO MINUTE[(minute_precision)] |
Returns a period of time in terms of days, hours, and minutes. day_precision is the number of digits in the day field; it ranges from 0 to 9. The default is 2. minute_precision is the number of digits in the minute field; it ranges from 0 to 2. The default is 2. |
INTERVAL '11 10:09' DAY TO MINUTE returns an interval of 11 days, 10 hours, and 09 minutes |
INTERVAL 'day time' DAY[(day_precision)] TO HOUR[(hour_precision)] |
Returns a period of time in terms of days and hours. day_precision is the number of digits in the day field; it ranges from 0 to 9. The default is 2. hour_precision is the number of digits in the hour field; it ranges from 0 to 2. The default is 2. |
INTERVAL '100 10' DAY(3) TO HOUR returns an interval of 100 days 10 hours |
INTERVAL 'day' DAY[(day_precision)] |
Returns a period of time in terms of days. day_precision is the number of digits in the day field; it ranges from 0 to 9. The default is 2. |
INTERVAL '999' DAY(3) returns an interval of 999 days |
INTERVAL 'time' HOUR[(hour_precision)] TO SECOND[(fractional_seconds_precision)] |
Returns a period of time in terms of hours, minutes, and seconds. hour_precision is the number of digits in the hour field; it ranges from 0 to 2. The default is 2. fractional_seconds_precision is the number of digits in the fractional part of the second value in the time field; it ranges from 0 to 9. |
INTERVAL '09:08:07.6666666' HOUR TO SECOND(7) returns an interval of 9 hours, 08 minutes, and 7.6666666 seconds |
INTERVAL 'time' HOUR[(hour_precision)] TO MINUTE[(minute_precision)] |
Returns a period of time in terms of hours and minutes. hour_precision is the number of digits in the hour field; it ranges from 0 to 2. The default is 2. minute_precision is the number of digits in the minute field; it ranges from 0 to 2. The default is 2. |
INTERVAL '09:30' HOUR TO MINUTE returns an interval of 9 hours and 30 minutes |
INTERVAL 'hour' HOUR[(hour_precision)] |
Returns a period of time in terms of hours. hour_precision is the number of digits in the hour field; it ranges from 0 to 2. The default is 2. |
INTERVAL '40' HOUR returns an interval of 40 hours |
INTERVAL 'minute' MINUTE[(minute_precision)] |
Returns a period of time in terms of minutes. minute_precision is the number of digits in the minute field; it ranges from 0 to 2. The default is 2. |
INTERVAL '15' MINUTE returns an interval of 15 minutes |
INTERVAL 'time' MINUTE[(minute_precision)] TO SECOND[(fractional_seconds_precision)] |
Returns a period of time in terms of minutes and seconds. minute_precision is the number of digits in the minute field; it ranges from 0 to 2. The default is 2. fractional_seconds_precision is the number of digits in the fractional part of the second value in the time field; it ranges from 0 to 9. |
INTERVAL '15:30' MINUTE TO SECOND returns an interval of 15 minutes and 30 seconds |
INTERVAL 'second' SECOND[(fractional_seconds_precision)] |
Returns a period of time in terms of seconds. fractional_seconds_precision is the number of digits in the fractional part of the second field; it ranges from 0 to 9. The default is 3. |
INTERVAL '15.678' SECOND returns an interval of 15.678 seconds |
Function | Description | Example |
---|---|---|
FIRST_VALUE(value) OVER ([ partition_clause ] order_by_clause [ windowFrame_clause ] ) |
Returns the value evaluated at the row that's the first row of the window frame. | FIRST_VALUE(BANK_ID) OVER (PARTITION BY BANK_ID ORDER BY BANK_NAME ROWS BETWEEN CURRENT ROW AND 1 FOLLOWING) returns the first value of BANK_ID in a window over which the rows are computed as the current row and 1 row after that row, partitioned by BANK_ID and in ascending order of BANK_NAME . |
LAG(value[, offset[, default]]) OVER ([ partition_clause ] order_by_clause) |
Returns the value evaluated at the row at a given offset before the current row within the partition. If there is no such row, the default value is returned. Both offset and default are evaluated with respect to the current row. If omitted, offset is defaulted to 1 and default to NULL. | LAG(BANK_ID, 2, 'hello') OVER (PARTITION BY BANK_ID ORDER BY BANK_NAME DESC) returns the value of BANK_ID from the second row before the current row, partitioned by BANK_ID and in descending order of BANK_NAME . If there is no such value, hello is returned. |
LAST_VALUE(value) OVER ([ partition_clause ] order_by_clause [ windowFrame_clause ]) |
Returns the value evaluated at the row that is the last row of the window frame. | LAST_VALUE(BANK_ID) OVER (PARTITION BY BANK_ID ORDER BY BANK_NAME ROWS BETWEEN CURRENT ROW AND 1 FOLLOWING) returns the last value of BANK_ID in a window over which the rows are computed as the current row and 1 row after that row, partitioned by BANK_ID and in ascending order of BANK_NAME . |
LEAD(value[, offset[, default]]) OVER ([ partition_clause ] order_by_clause) |
Returns the value evaluated at the row at a given offset after the current row within the partition. If there is no such row, the default value is returned. Both offset and default are evaluated with respect to the current row. If omitted, offset is defaulted to 1 and default to NULL. | LEAD(BANK_ID, 2, 'hello') OVER (PARTITION BY BANK_ID ORDER BY BANK_NAME) returns the value of BANK_ID from the second row after the current row, partitioned by BANK_ID and in ascending order of BANK_NAME . If there is no such value, hello is returned. |
RANK() OVER([ partition_clause ] order_by_clause) |
Returns the rank of the current row with gaps, counting from 1. | RANK() OVER (PARTITION BY BANK_ID ORDER BY BANK_NAME) returns the rank of each row within the partition group of BANK_ID , in ascending order of BANK_NAME . |
ROW_NUMBER() OVER([ partition_clause ] order_by_clause) |
Returns the unique number of the current row within its partition, counting from 1. | ROW_NUMBER() OVER (PARTITION BY BANK_ID ORDER BY BANK_NAME) returns the unique row number of each row within the partition group of BANK_ID , in ascending order of BANK_NAME . |
Functions | Description | Example |
---|---|---|
CAST(value AS type) | Returns the specified value in the specified type. | CAST("10" AS INT) returns 10 |
CONCAT(string, string) | Returns the combined values of strings or columns. | CONCAT('Oracle','SQL') returns OracleSQL |
CONCAT_WS(separator, expression1, expression2, expression3,...) |
Returns the combined values of strings or columns using the specified separator in between the strings or columns. A separator is required and it must be a string. At least one expression must be provided after the separator. For example: |
CONCAT_WS('-', 'Hello', 'Oracle') returns Hello-Oracle
If a child of the function is an array, then the array is flattened:
|
INITCAP(string) | Returns the string with the first letter in each word capitalized, while all other letters are in lowercase, and each word is delimited by a white space. | INITCAP('oRACLE sql') returns Oracle Sql |
INSTR(string, substring[start_position]) | Returns the (1-based) index of the first occurrence of substring in string . | INSTR('OracleSQL', 'SQL') returns 7 |
LOWER(string) | Returns the string with all letters changed to lowercase. | LOWER('ORACLE') returns oracle |
LENGTH(string) | Returns the character length of string or the number of bytes of binary data. The length of the string includes trailing spaces. | LENGTH('Oracle') returns 6 |
LTRIM(string) | Returns the string with leading spaces removed from the left. | LTRIM(' Oracle') |
NVL(expr1, epxr2) | Returns the argument that is not null. | NVL(EXPRESSION_3.CUSTOMERS_JSON.CONTINENT_ID, ROWID()) |
REGEXP_SUBSTR(string, regexp[, RegexGroupIdx]) | Searches and extracts the string that matches a regular expression pattern from the input string. If the optional capturing group index is provided, the function extracts the specific group. |
REGEXP_SUBSTR('22;33;44', '([0-9.]*);([0-9.]*);([0-9.]*)', 1) returns 22 |
REPLACE(string, search, replacement) | Replaces all occurrences of search with replacement .If If | REPLACE('ABCabc', 'abc', 'DEF') returns ABCDEF |
RTRIM(string) | Returns the string with leading spaces removed from the right. | RTRIM('Oracle ') |
SUBSTRING(string, position[, substring_length]) | Returns the substring starting at position. | SUBSTRING('Oracle SQL' FROM 2 FOR 3) returns rac |
For numbers, TO_CHAR(expr) and for dates TO_CHAR(expr, format[, locale]) | Converts numbers and dates into strings. For numbers, no format is required. For dates, use the same format as DATE_FORMAT described in Date and Time Functions. Default locale is en-US . See supported language tags.In pipeline expressions, the
| Number example: Date example: |
UPPER(string) | Returns a string with all letters changed to uppercase. | UPPER('oracle') returns ORACLE |
LPAD(str, len[, pad]) | Returns a string that is left-padded with specified characters to a certain length. If the pad character is omitted, the default is a space. | LPAD('ABC', 5, '*') returns '**ABC' |
RPAD(str, len[, pad]) | Returns a string that is right-padded with specified characters to a certain length. If the pad character is omitted, the default is a space. | RPAD('XYZ', 6, '+' ) returns 'XYZ+++' |
Function | Description | Example |
---|---|---|
CASE WHEN condition1 THEN result1 ELSE result2 END | Returns the value for which a condition is met. | CASE WHEN 1 > 0 THEN 'ABC' ELSE 'XYZ' END returns ABC if 1> 0 , otherwise returns XYZ |
AND | The logical AND operator. Returns true if both operands are true, otherwise returns false. | (x = 10 AND y = 20) returns "true" if x is equal to 10 and y is equal to 20. If either one is not true, then it returns "false" |
OR | The logical OR operator. Returns true if either operand is true or both are true, otherwise returns false. | (x = 10 OR y = 20) returns "false" if x is not equal to 10 and also y is not equal to 20. If either one is true, then it returns "true" |
NOT | The logical NOT operator. | |
LIKE | Performs string pattern matching, whether string1 matches the pattern in string2. | |
= | Tests for equality. Returns true if expr1 equals expr2, otherwise returns false. | x = 10 returns "true" when value of x is 10, else it returns "false" |
!= | Tests for inequality. Returns true if expr1 does not equal to expr2, otherwise returns false. | x != 10 returns "false" if value of x is 10, else it returns "true" |
> | Tests for an expression greater than. Returns true if expr1 is greater than expr2. | x > 10 returns "true" if value of x is greater than 10, else it returns "false" |
>= | Tests for an expression greater than or equal to. Returns true if expr1 is greater than or equal to expr2. | x > =10 returns "true" if value of x is greater than or equal to 10, else it returns "false" |
< | Tests for an expression less than. Returns true if expr1 is less than expr2. | x < 10 returns "true" if value of x is less than 10, else it returns "false" |
<= | Tests for an expression less than or equal to. Returns true if expr1 is less than or equal to expr2. | x <= 10 returns "true" if value of x is less than 10, else it returns "false" |
|| | Concatenates two strings. | 'XYZ' || 'hello' returns 'XYZhello' |
BETWEEN | Evaluates a range. | FILTER_1.BANK.BANK_ID BETWEEN 1003 AND 1007 |
IN | Tests whether an expression matches a list of values. | FILTER_2.ORDERS.ORDER_ID IN (1003, 1007) |
Function | Description | Example |
---|---|---|
NUMERIC_ID() | Generates a universally unique identifier that is a 64-bit number for each row. | NUMERIC_ID() returns for example, 3458761969522180096 and 3458762008176885761 |
ROWID() | Generates monotonically increasing 64-bit numbers. | ROWID() returns for example, 0 , 1 , 2 , and so on |
UUID() | Generates a universally unique identifier that is a 128-bit String for each row. | UUID() returns for example, 20d45c2f-0d56-4356-8910-162f4f40fb6d |
MONOTONICALLY_INCREASING_ID() |
Generates unique, monotonically increasing 64-bit integers that are not consecutive numbers. | MONOTONICALLY_INCREASING_ID() returns for example,
8589934592 and 25769803776 |
Function | Description | Example |
---|---|---|
COALESCE(value, value [, value]*) | Returns the first non-null argument, if it exists, otherwise returns null. | COALESCE(NULL, 1, NULL) returns 1 |
NULLIF(value, value) | Returns null if the two values equal each other, otherwise returns the first value. | NULLIF('ABC','XYZ') returns ABC |
Function | Description | Example |
---|---|---|
SCHEMA_OF_JSON(string) | Parses a JSON string and infers the schema in DDL format. |
|
FROM_JSON(column, string) | Parses a column containing a JSON string into one of the following types, with the specified schema.
|
|
TO_JSON(column) | Converts a column containing a type of Struct or Array of Structs, or a Map or Array of Map, into a JSON string. | TO_JSON(TO_STRUCT('s1', TO_ARRAY(1,2,3), 's2', TO_MAP('key', 'value'))) returns a JSON string {"s1":[1,2,3],"s2":{"key":"value"}} |
TO_MAP(string,column[,string,column]*) | Creates a new column of Map type. The input columns must be grouped as key-value pairs. The input key columns cannot be null, and must all have the same data type. The input value columns must all have the same data type. |
|
TO_STRUCT(string,column[,string,column]*) | Creates a new column of Struct type. The input columns must be grouped as key-value pairs. |
|
TO_ARRAY(column[,column]*) | Creates a new column as Array type. The input columns must all have the same data type. |
|
Data flow operators that support creating expressions and hierarchical data types can use higher-order functions.
The supported operators are:
-
Aggregate
-
Expression
-
Filter
-
Join
-
Lookup
-
Split
-
Pivot
Function | Description | Example |
---|---|---|
TRANSFORM(column, lambda_function) | Takes an array and an anonymous function, and sets up a new array by applying the function to each element, and assigning the result to the output array. | For an input array of integers [1, 2, 3] , TRANSFORM(array, x -> x + 1) returns a new array of [2, 3, 4] . |
TRANSFORM_KEYS(column, lambda_function) | Takes a map and a function with 2 arguments (key and value), and returns a map in which the keys have the type of the result of the lambda function, and the values have the type of the column map values. | For an input map with integer keys and string values of {1 -> 'value1', 2 -> 'value2', 3 -> 'value3'} , TRANSFORM_KEYS(map, (k, v) -> k * 2 + 1) returns a new map of {3 -> 'value1', 5 -> 'value2', 7 -> 'value3'} . |
TRANSFORM_VALUES(column, lambda_function) | Takes a map and a function with 2 arguments (key and value), and returns a map in which the values have the type of the result of the lambda functions, and the keys have the type of the column map keys. | For an input map with string keys and string values of {'a' -> 'value1', 'b' -> 'value2', 'c' -> 'value3'} , TRANSFORM_VALUES(map, (k, v) -> k || __ '' || v) returns a new map of {'a' -> 'a_value1', 'b' -> 'b_value2', 'c' -> 'c_value3'} . |
ARRAY_SORT(array(...), lambda_function) |
Only the Expression operator supports Takes an array and sorts according to the given function that takes 2 arguments. The function must return -1, 0, or 1 depending on whether the first element is less than, equal to, or greater than the second element. If the function is omitted, the array is sorted in ascending order. |
The returned array is: [1,5,6] |
Function Operator
Use the function operator to invoke Oracle Cloud Infrastructure Functions from within Data Integration.
For complex data structures such as Map, Array and Composite (Struct), only the JSON serialization format is currently supported.
Before you use OCI Functions from within a data flow in Data Integration, ensure that you understand the dependencies, and have completed the prerequisite tasks.
The function you want to invoke must be deployed in OCI Functions. The function can be written in any language.
Use the following task checklist to ensure that you have the required setup and information for using the function operator with OCI Functions.
Task | Requirement |
---|---|
Set up to use and access OCI Functions | This topic assumes that the tenancy and development environment are already set up for deploying functions to OCI Functions. Only functions that are deployed to OCI Functions can be used with the function operator in Data Integration. |
Create policies to control the invocation and management of functions. | This topic assumes you or the tenancy administrator have already created the necessary Oracle Cloud Infrastructure policies. See Creating Policies to Control Access to Network and Function-Related Resources. In a production environment, you might want to restrict users to invoking functions in a specific application, or to invoking a specific function only. For example, to restrict users to invoking functions in a specific workspace, enter the policy statements in the following format:
To restrict users to invoking a specific function in a workspace:
|
Create an application in OCI Functions. | An application is a logical grouping of functions. In an application, you specify one to three subnets in which to run functions. Functions running in one application are isolated from functions running in another application. |
Deploy a function in OCI Functions. |
For the function in OCI Functions to work with the Data Integration function operator in a data flow, the function must not read or write any identifier with a space character. When you deploy a function to OCI Functions using the Fn Project CLI, the function is built as a Docker image and pushed to a specified Docker registry. |
Collect the information needed to use the deployed function with the function operator. | When you add and configure a function operator in a Data Integration data flow, you need to know the following:
|
A function operator lets you use a function deployed in OCI Functions to process data from an input source. Primitive and complex data types are supported.
Specify the input shape of the function, and the input and output attributes for the function operator to read and write. Then manually map the source attributes to the input attributes.
Currently, only the JSON serialization format is supported for complex data structures such as Map, Array, and Composite (Struct).
The following procedure assumes you have added and configured a source operator with the data entity you want to use with the function.
The function you want to use with a function operator must be deployed in an application in Oracle Cloud Infrastructure Functions.
- On the canvas, select the function operator.
- Under the Details tab of the Properties panel, for OCI Function, click Select.
- In the panel, select the Compartment that has the OCI Functions application in which the function you want to use has been deployed.
- Select the Application in OCI Functions that has the deployed function.
- In the OCI Function section, select the function.
- Click OK.
After selecting a deployed function, specify the properties that define the input data for the function to process, the fields in the function configuration, and the output data that the function returns.
Do not use a space character in the identifier name for an input attribute, an output attribute, or a function field. Also, names such as "Element", "Key", and "Value" are not permitted.
- Under Function properties, click Add property.
- In the panel, select the Type of property you want to specify.
- Input attributes: Input shape for the operator. Specify one or more attributes to represent the incoming attributes for the function to process. Input attributes are mapped to the incoming attributes from the data entity source.
- Function configuration: Specify one or more function fields that define the input shape of the function. A field has a name and a value. The value of a field specifies an input attribute.
- Output attributes: Output shape for the operator. Specify zero or more attributes to represent the output from the function after the data is processed. Add an output attribute for each function field that you want in the output. Subsequent downstream operators can use the output attributes.
- Enter a name for the property in the Identifier field. Do not use a space character in the name for an input attribute, an output attribute, or a function field.
- Select the Data type of the property.
For a primitive data type, depending on the Type and Data type of the property you're specifying, specify the following fields, and other applicable fields.
- Length: For an input or output attribute, enter a length. For example, you specify a length for a numeric or varchar data type.
- Value: For a function field, enter a value that matches an input attribute.
For a Map complex data type, specify the key-value pair for a map element. Click Add data type to select the key data type and value data type.
- Key: Select from primitive data types only.
- Value: Select from primitive and complex data types.
For an Array complex data type, click Add data type to specify the Element data type. You can select from primitive and complex data types.
For a Composite (Struct) data type, specify the schema child or schema children. To add a schema child, click the + symbol next to the parent schema name.
In the Add Field panel for a schema child, enter the Identifier, and select a Data type. You can select from primitive and complex data types. Click Add to add the schema child.
Click the + symbol again if you want to add more than one schema child to the parent schema.
To edit or delete a schema child, use the Actions menu () at the end of the row.
You cannot delete the parent schema element.
- In the Add Property panel, click Add.
- Repeat the steps to add the properties you need to define the function input, function fields, and function output.
Flatten Operator
Use the flatten operator to transform hierarchical data into a simplified format for use with other relational data. The flattening process is also known as denormalization or unnesting.
The hierarchical file formats that you can denormalize are:
- JSON and multi-line JSON
- Avro
- Parquet
Currently, the supported hierarchical data type that you can unnest is Array. You can select only one Array node in a hierarchical dataset to flatten. The data structure is flattened from the root to the selected Array node, and presented in a simplified format. If the dataset has other Array nodes, you have the option to convert those into strings.
You can configure the attributes or fields to include in the flattened output by setting Projection preferences.
Projection preferences are settings for the format, number, and type of projected attributes to include in the output after flattening the incoming data.
Data Integration provides four project preferences settings that are selected by default, resulting in a projected output that has the following:
- Index arrays
- All attributes up to the attribute that is selected for flattening
- Null values for the attributes that are missing from the subsequent arrays
- Parent name lineage in the attribute names
To understand the projection preferences that you can configure, use the following topics with the JSON data structure example.
JSON data structure example
id
vehicle[]
make
model
insurance{}
company
policy_number
maintenance[]
date
description[]
dealerdetails[]
servicestation1[]
dealeraddress[]
city
state
contact
dealername
servicestation2[]
dealeraddress[]
city
state
contact
dealername
The projection preference setting Create and project array indexes lets you control whether to include index attributes for arrays in the flattened output.
The index attribute represents the index of an array. Suppose that the array is
["a","b","c"]
, then the index for "a" is 0, the index for "b" is 1,
and the index for "c" is 2.
By default, Data Integration creates an attribute with
the suffix _INDEX
for the array node that you select to flatten data
to. The data type of an array index attribute is Integer.
Index attributes are also created for the parent array nodes of the selected array node. The flattening operation impacts all parent array nodes in the hierarchical structure from the root to the selected node. If an impacted parent array node has sibling arrays, index attributes are not created for those sibling array nodes.
In the JSON data structure example, if you select the array dealeraddress
that's under servicestation2
for flattening, Data Integration creates five array index attributes: one for the selected array node, and four for the parent nodes impacted by the flattening operation.
After flattening, the attributes in the simplified structure are as follows:
id
vehicle_INDEX
vehicle_make
vehicle_model
vehicle_insurance
company
policy_number
vehicle_maintenance_INDEX
vehicle_maintenance_date
vehicle_maintenance_description
vehicle_maintenance_dealerdetails_INDEX
vehicle_maintenance_servicestation1
dealeraddress
city
state
contact
dealername
vehicle_maintenance_servicestation2_INDEX
vehicle_maintenance_servicestation2_dealeraddress_INDEX
vehicle_maintenance_servicestation2_dealeraddress_city
vehicle_maintenance_servicestation2_dealeraddress_state
vehicle_maintenance_servicestation2_dealeraddress_contact
vehicle_maintenance_servicestation2_dealeraddress_dealername
An array index attribute is not created for servicestation1
because the
array node is a sibling of the impacted parent node servicestation2
.
Similarly, an index attribute is not created for description
because
the array node is a sibling of the impacted parent node
dealerdetails
.
If you do not select the projection preference Create and project array indexes, the attributes in the simplified structure are as follows:
id
vehicle_make
vehicle_model
vehicle_insurance
company
policy_number
vehicle_maintenance_date
vehicle_maintenance_description
vehicle_maintenance_servicestation1
dealeraddress
city
state
contact
dealername
vehicle_maintenance_servicestation2_dealeraddress_city
vehicle_maintenance_servicestation2_dealeraddress_state
vehicle_maintenance_servicestation2_dealeraddress_contact
vehicle_maintenance_servicestation2_dealeraddress_dealername
The projection preference setting Retain all attributes up to the flattened array lets you control whether to include in the flattened output those attributes that are not part of the flattened arrays.
By default, Data Integration shows all attributes from the root to the selected array, including sibling arrays not impacted by flattening.
In the JSON data structure example, if you do not select the projection preference Retain all attributes up to the flattened array, the attributes in the simplified structure are as follows:
id
vehicle_INDEX
vehicle_maintenance_INDEX
vehicle_maintenance_dealerdetails_INDEX
vehicle_maintenance_servicestation2_INDEX
vehicle_maintenance_servicestation2_dealeraddress_INDEX
vehicle_maintenance_servicestation2_dealeraddress_city
vehicle_maintenance_servicestation2_dealeraddress_state
vehicle_maintenance_servicestation2_dealeraddress_contact
The projection preference setting Produce a single row with null values for missing parent array lets you control whether to skip rows that do not have the attributes impacted by flattening.
The effect of the Produce a single row with null values for missing parent array setting is seen only in the Data tab. By default, Data Integration shows null values for the attributes that are missing from the subsequent arrays.
For example, the following is a projected output with null values:
|Company1|Redmond |[...]|true |[...]|0 |1 |123.34 |0 |Laptop |20 |
|Company1|Redmond |[...]|true |[...]|0 |1 |123.34 |1 |Charger |2 |
|Company1|Redmond |[...]|true |[...]|1 |2 |323.34 |0 |Mouse |2 |
|Company1|Redmond |[...]|true |[...]|1 |2 |323.34 |1 |Keyboard |1 |
|Company2|Seattle |[...]|false |[...]|0 |4 |123.34 |0 |Laptop |20 |
|Company2|Seattle |[...]|false |[...]|0 |4 |123.34 |1 |Charger |3 |
|Company2|Seattle |[...]|false |[...]|1 |5 |343.24 |0 |Chair |4 |
|Company2|Seattle |[...]|false |[...]|1 |5 |343.24 |1 |Lamp |2 |
|Company3|Bellevue|null |true |[...]|0 |4 |123.34 |null |null |null |
|Company4|Kirkland|null |null |null |null |null |null |null |null |null |
If you do not select the option, then the rows are skipped and the projected output is:
|Company1|Redmond |[...]|true |[...]|0 |1 |123.34 |0 |Laptop |20 |
|Company1|Redmond |[...]|true |[...]|0 |1 |123.34 |1 |Charger |2 |
|Company1|Redmond |[...]|true |[...]|1 |2 |323.34 |0 |Mouse |2 |
|Company1|Redmond |[...]|true |[...]|1 |2 |323.34 |1 |Keyboard |1 |
|Company2|Seattle |[...]|false |[...]|0 |4 |123.34 |0 |Laptop |20 |
|Company2|Seattle |[...]|false |[...]|0 |4 |123.34 |1 |Charger |3 |
|Company2|Seattle |[...]|false |[...]|1 |5 |343.24 |0 |Chair |4 |
|Company2|Seattle |[...]|false |[...]|1 |5 |343.24 |1 |Lamp |2 |
The projection preference setting Retain parent name lineage in attribute names lets you control whether to include the parent names in the child attribute names.
By default, Data Integration creates child node names with their parent node names.
In the JSON data structure example, the attributes with parent names in the simplified structure are as follows (assuming sibling arrays not impacted by flattening are excluded in the output):
id
vehicle_INDEX
vehicle_maintenance_INDEX
vehicle_maintenance_dealerdetails_INDEX
vehicle_maintenance_servicestation2_INDEX
vehicle_maintenance_servicestation2_dealeraddress_INDEX
vehicle_maintenance_servicestation2_dealeraddress_city
vehicle_maintenance_servicestation2_dealeraddress_state
vehicle_maintenance_servicestation2_dealeraddress_contact
If you do not select the projection preference Retain parent name lineage in attribute names, the attributes are as follows:
id
vehicle_INDEX
maintenance_INDEX
dealerdetails_INDEX
servicestation2_INDEX
dealeraddress_INDEX
city
state
contact
The following procedure assumes you have added a source operator and configured the operator to an Object Storage source with complex data types, such as a JSON file.
After you have selected one complex attribute for flattening, there might be other Array nodes in the flattened structure.
With the other Array nodes in a flattened structure, you can convert an Array of primitive data types or an Array of Structs into a String. Currently, the supported format of a conversion String is JSON.
Depending on the data, the data type of a conversion String is VARCHAR or BLOB. You can change a conversion String of VARCHAR to CLOB, and conversely from CLOB to VARCHAR.
The following procedure assumes you have added a flatten operator and selected a complex attribute to create a flattened data structure.
The following procedure assumes you have selected an Array node to be converted into a String, if applicable.
When configuring a flatten operator on a data flow, you can revert a flattening by clearing the Flatten by path.
Table Function Operator
Use the table function operator to add table functions that return data in the form of tables.
Currently, only built-in table functions can be selected from a list. The table functions that are provided by the table function operator include cube, cross tabulation analysis, frequent items, rollup, and Spark SQL. Each built-in table function has predefined fields that you configure to return the desired collection of data. Complex logic can be consolidated into a single function that returns specific sets of data. Parameterization of the fields allows for flexibility in adjusting the logic.
A table function operator can be connected with any other operator within a data flow. Thus, a table function operator can be used as a data source, a mid-stream operator, and a target.
Here's a list of the built-in table functions that are supported by the table function operator.
Persists and caches a dataset according to the given memory storage level
.
Returns a new dataset in the cached resilient distributed dataset (RDD).
Parameter | Description |
---|---|
level |
The memory storage to use:
|
Computes a pair-wise frequency table or contingency table from distinct values of the given two columns.
The number of distinct values for each column must be less than 1e4.
Returns a dataframe that contains the contingency table. In the contingency table:
- The name of the first column is
col1_col2
, wherecol1
is the first column andcol2
is the second column before transformation. The first column of each row are the distinct values ofcol1
. - The other column names are the distinct values of
col2
. - The counts are returned as the
long
type. - Pairs without occurrences are given zero as the count.
- The maximum number of non-zero pairs is 1e6.
- Null elements are replaced by
null
.
Parameter | Description |
---|---|
col1 |
Name of the first column. See the note after this table for limitations. Distinct items from |
col2 |
Name of the second column. See the note after this table for limitations. Distinct items from |
Only the following characters are allowed in a column name:
- lower and upper case letters
a-z
A-Z
- numbers
0-9
- underscore
_
Example
Suppose col1
is age
and col2
is salary
.
Before transformation:
+---+------+
|age|salary|
+---+------+
|1 |100 |
|1 |200 |
|2 |100 |
|2 |100 |
|2 |300 |
|3 |200 |
|3 |300 |
+---+------+
After transformation:
+----------+---+---+---+
|age_salary|100|200|300|
+----------+---+---+---+
|2 |2 |0 |1 |
|1 |1 |1 |0 |
|3 |0 |1 |1 |
+----------+---+---+---+
Generates a multi-dimensional cube of all possible combinations using the given columns list and the calculations in the given aggregate function expressions.
The following aggregate functions are supported in the expressions:
AVG
COUNT
MEAN
MAX
MIN
SUM
Returns a dataframe after the cube transformation.
Parameter | Description |
---|---|
cubeColumns |
The list of columns, separated by commas, by which to generate the multi-dimensional cube. |
aggExpressions |
The aggregate function expressions to perform on the columns. For example: |
Example
Before transformation:
+----------+-----+-------+------+---+
|department|group|region |salary|age|
+----------+-----+-------+------+---+
|Eng |A |local |1000 |10 |
|Art |B |foreign|2000 |11 |
|Sport |C |local |1500 |13 |
|Eng |A |foreign|800 |19 |
|Art |A |foreign|3000 |30 |
|Eng |E |foreign|1000 |21 |
|Art |D |local |1000 |32 |
|Sport |C |local |1000 |28 |
+----------+-----+-------+------+---+
If cubeColumns
is department, region
and aggExpressions
issalary -> avg, age -> max
(compute the average salary and maximum age), the after transformation cube is:
+----------+-------+-----------------+--------+
|department|region |avg(salary) |max(age)|
+----------+-------+-----------------+--------+
|Eng |foreign|900.0 |21 |
|Sport |null |1250.0 |28 |
|Art |null |2000.0 |32 |
|Eng |null |933.3333333333334|21 |
|null |local |1125.0 |32 |
|Art |foreign|2500.0 |30 |
|null |foreign|1700.0 |30 |
|Eng |local |1000.0 |10 |
|null |null |1412.5 |32 |
|Sport |local |1250.0 |28 |
|Art |local |1000.0 |32 |
+----------+-------+-----------------+--------+
Generates synthetic data using the given file format and content. The supported file formats are:
- CSV
- JSON
Returns a data entity. The synthetic data source can be used in pipeline unit-tests where real data is not used or moved.
Parameter | Description |
---|---|
format |
The type of file. Supported values are:
|
content |
The file content for the given format. |
Example
A CSV content example:
co1,co2,co3
1,B,3
A,2,C
Data generation result:
+---+---+---+
|co1|co2|co3|
+---+---+---+
|1 |B |3 |
+---+---+---+
|A |2 |C |
+---+---+---+
Finds duplicates in the given column or columns list and returns a new dataset with the duplicate rows removed.
Parameter | Description |
---|---|
columns |
A column name or a comma-separated list of column names. |
Example
Before transformation:
+---+---+
|a |b |
+---+---+
|1 |2 |
|3 |4 |
|1 |3 |
|2 |4 |
+---+---+
If columns
is a
, the dataset after transformation is:
+---+---+
|a |b |
+---+---+
|1 |2 |
|3 |4 |
|2 |4 |
+---+---+
Finds frequent items in the given column or columns list using the given minimum frequency.
The frequent element count algorithm proposed by Karl et al. is used to find frequent items. False positives are possible.
Returns a dataframe with an array of frequent items for each column.
Parameter | Description |
---|---|
freqItems |
A column name or a comma-separated list of column names for which you want to find frequent items. |
support |
Minimum frequency for an item to be considered frequent. The value must be greater than 1e-4 (decimal value of For example, |
Example
Before transformation:
+-------+--------------+
|user_id|favorite_fruit|
+-------+--------------+
| 1| apple|
| 2| banana|
| 3| apple|
| 4| orange|
| 5| banana|
| 6| banana|
| 7| apple|
+-------+--------------+
If freqItems
is favorite_fruit
and support
is 0.6
, the dataframe returned after transformation is:
+------------------------+
|favorite_fruit_freqItems|
+------------------------+
| [banana]|
+------------------------+
Removes rows that contain null or NaN values in the given column or columns list.
Returns a dataframe that excludes the rows.
Parameter | Description |
---|---|
how |
How to determine if a row is to be dropped. Supported values are:
The |
cols |
A column name or a comma-separated list of column names. |
minNonNulls |
The minimum number of non-null and non-NaN values that a row can contain. Drops the rows that contain less than the specified minimum. The |
Example
Before transformation:
+----+-------+----+
|id |name |age |
+----+-------+----+
|1 |Alice |25 |
|2 |null |28 |
|null|Bob |null|
|4 |Charlie|30 |
+----+-------+----+
If how
is any
and cols
is name
, the dataframe returned after transformation is:
+----+-------+----+
|id |name |age |
+----+-------+----+
|1 |Alice |25 |
|null|Bob |null|
|4 |Charlie|30 |
+----+-------+----+
When you use the how
parameter with the value all
, a row is dropped only if all its values are null. For example, before transformation:
+----+--------+----+
| id| name| age|
+----+--------+----+
| 1|John Doe| 30|
| 2| null|null|
| 3|Jane Doe| 25|
| 4|Jake Doe|null|
|null| null|null|
+----+--------+----+
After transformation:
+---+--------+----+
| id| name| age|
+---+--------+----+
| 1|John Doe| 30|
| 2| null|null|
| 3|Jane Doe| 25|
| 4|Jake Doe|null|
+---+--------+----+
Replaces null values and returns a dataframe using the replaced values.
Parameter | Description |
---|---|
replacement |
The key-value map to use for replacing null values. The key is a column name. The value is a replacement value. The key points to the replacement value. For example: A replacement value is cast to the column's data type. A replacement value must be of the following type:
|
Example
Before transformation:
+----+-------+----+
|id |name |age |
+----+-------+----+
|1 |Alice |25 |
|2 |null |28 |
|null|Bob |null|
|4 |Charlie|30 |
+----+-------+----+
If the key-value map is as follows:
id -> 3
name -> unknown
age -> 10
After transformation:
+---+-------+---+
|id |name |age|
+---+-------+---+
|1 |Alice |25 |
|2 |unknown|28 |
|3 |Bob |10 |
|4 |Charlie|30 |
+---+-------+---+
Replaces a value with another value in the given column or columns by matching keys in the given key and replacement value map.
Returns a new dataframe that contains the replaced transformation.
Parameter | Description |
---|---|
cols |
A column name or a comma-separated list of column names on which to apply replacement values. If |
replacement |
The replacement key-value map to use. The key is a value to replace. The value is the replacement value. The map value can have nulls. The key points to the replacement value. For example: The key and replacement value pair must have the same type. Only the following types are supported:
|
Example
Before transformation:
+----+-------+----+
|id |name |age |
+----+-------+----+
|1 |Alice |25 |
|2 |null |28 |
|null|Bob |null|
|4 |Charlie|30 |
+----+-------+----+
If cols
is name
, and replacement
is Alice -> Tom
, the dataframe after transformation is:
+----+-------+----+
|id |name |age |
+----+-------+----+
|1 |Tom |25 |
|2 |null |28 |
|null|Bob |null|
|4 |Charlie|30 |
+----+-------+----+
Divides a dataset by the given columns into the specified number of partitions.
Returns a new dataset that's hash partitioned. The exact number of partition as specified is returned.
Parameter | Description |
---|---|
partitionColumns |
A column name or comma-separated list of column names by which the dataset is partitioned. |
numberOfPartitions |
The number of partitions to create. |
Generates a multi-dimensional rollup of possible combinations using the given columns list and the calculations in the given aggregate function expressions.
The following aggregate functions are supported in the expressions:
AVG
COUNT
MEAN
MAX
MIN
SUM
Returns a dataframe after the rollup transformation.
Parameter | Description |
---|---|
rollupColumns |
The list of columns, separated by commas, by which to generate the multi-dimensional rollup. |
aggExpressions |
The aggregate function expressions to perform on the columns. For example: |
Example
Before transformation:
+----------+-----+-------+------+---+
|department|group|region |salary|age|
+----------+-----+-------+------+---+
|Eng |A |local |1000 |10 |
|Art |B |foreign|2000 |11 |
|Sport |C |local |1500 |13 |
|Eng |A |foreign|800 |19 |
|Art |A |foreign|3000 |30 |
|Eng |E |foreign|1000 |21 |
|Art |D |local |1000 |32 |
|Sport |C |local |1000 |28 |
+----------+-----+-------+------+---+
If rollupColumns
is department, region
and aggExpressions
is salary -> avg, age -> max
(compute the average salary and maximum age), the after transformation rollup is:
+----------+-------+-----------------+--------+
|department|region |avg(salary) |max(age)|
+----------+-------+-----------------+--------+
|Eng |foreign|900.0 |21 |
|Sport |null |1250.0 |28 |
|Art |null |2000.0 |32 |
|Eng |null |933.3333333333334|21 |
|Art |foreign|2500.0 |30 |
|Eng |local |1000.0 |10 |
|null |null |1412.5 |32 |
|Sport |local |1250.0 |28 |
|Art |local |1000.0 |32 |
+----------+-------+-----------------+--------+
Runs Spark SQL queries on incoming data, by first creating temporary views using the given table names.
Parameter | Description |
---|---|
SQL |
The SQL statement or script to run. Example: |
tableName |
A table name or comma-separated list of table names by which Spark creates temporary tables. Example: |
Generates a stratified sample without replacement based on the given sampling fraction for each stratum.
Returns a new dataframe that represents the stratified sample.
Parameter | Description |
---|---|
strata_col |
The column that defines the strata. |
fractions |
The sampling fraction for each stratum, from For example, If a fraction is not specified, zero is assumed. |
sample_size |
If |
seed |
Use any random number in |
Example
Before transformation:
+---+-----+
|key|value|
+---+-----+
| 1| 1|
| 1| 2|
| 2| 1|
| 2| 1|
| 2| 3|
| 3| 2|
| 3| 3|
+---+-----+
If strata_col
is key
, and fractions
is the following:
1 -> 1.0, 3 -> 0.5
After transformation:
+---+-----+
|key|value|
+---+-----+
| 1| 1|
| 1| 2|
| 3| 2|
+---+-----+
Computes the given statistics for numeric and string columns.
If no statistics are provided, all the following are computed:
COUNT
MEAN
STDDEV
MIN
- approximate quartiles (percentiles at 25%, 50%, and 75%)
MAX
Parameter | Description |
---|---|
statistics |
A comma-separated list of statistics. Supported values are:
Example: |
Example
Before transformation:
+----------+------+-----------------+--------+
|department|gender|avg(salary) |max(age)|
+----------+------+-----------------+--------+
|Eng |female|900.0 |21 |
|Sport |null |1250.0 |28 |
|Art |null |2000.0 |32 |
|Eng |null |933.3333333333334|21 |
|Art |female|2500.0 |30 |
|Eng |male |1000.0 |10 |
|null |null |1412.5 |32 |
|Sport |male |1250.0 |28 |
|Art |male |1000.0 |32 |
+----------+------+-----------------+--------+
Suppose statistics
is count, mean, stddev, min, 27%, 41%, 95%, max
.
After transformation:
+-------+----------+-----+------+----------------+----------------+
|summary|department|group|gender|salary |age |
+-------+----------+-----+------+----------------+----------------+
|count |8 |8 |8 |8 |8 |
|mean |null |null |null |1412.5 |20.5 |
|stddev |null |null |null |749.166203188585|8.76682056718072|
|min |Art |A |female|800 |10 |
|27% |null |null |null |1000 |13 |
|41% |null |null |null |1000 |19 |
|95% |null |null |null |3000 |32 |
|max |Sport |E |male |3000 |32 |
+-------+----------+-----+------+----------------+----------------+