Using Data Flow Operators

In Data Integration, data flow operators represent input sources, output targets, and transformations that can be used in a data flow.

From the Operators panel, drag operators onto the canvas to design a data flow. Then use the Details tab on the Properties panel to configure basic and required properties for each operator. Where applicable, use the Advanced options tab to specify other properties.

In general, a data flow operator can have one or more inbound ports, and one or more outbound ports for data to flow through. For example, you can connect the same source outbound port to inbound ports on a filter, a join, and a target operator. You can also connect another source operator to the same join inbound port.

A data flow must include at least one source operator and one target operator to be valid. While Data Integration supports multiple target operators in a data flow, a target operator can have only one inbound port.

To connect operators, hover over an operator until you see the connector (small circle) on the right side of the operator. Then drag the connector to the next operator you want to connect to. A connection is valid when a line connects the operators after you drop the connector.

A connection line symbolizes how data flows from one node to the other. While you can drag a visible connector from one object to another, you can't have more than one inbound connection line to a filter, expression, aggregate, distinct, sort, and target operator.

Important

For complex data types, see Hierarchical Data Types to understand what is supported. You might not be able to perform some configurations on the Attributes tab and Data tab in the Properties panel.

Data Operators

Data Integration provides the source operator and target operator for adding input and output data entities to serve as the input to data flows, and the output for transformed data.

Selecting a data asset, connection, and schema

To configure the input or output data entity for a data operator in a data flow, you begin by selecting a data asset, connection, and schema (or bucket).

You make the selections only in the order as displayed on the Details tab of the Properties panel, by clicking Select when the option is enabled next to the resource type.

For example, when you first add a source operator, only Data asset is enabled for selection. The select option for the subsequent resource type, Connection, is enabled only after you make a selection for the preceding object.

When you click Select next to a resource, a panel appears for you to select the object you want. You can use the resource menu to select the object, or click View all to use another panel to browse or search, and then select the object.

Upon selecting a resource object, each subsequent selection is based on the parent-child relation inherited from the preceding selection. A breadcrumb at the top of the selection panel shows the selection hierarchy. For example, let's assume that you have selected the data asset "Oracle Database data asset 1," and the connection "Default connection." When you select the Schema, the breadcrumb displays as "From Oracle Database data asset1 using connection Default connection."

Selecting a data entity

After selecting a data asset, connection, and schema (or bucket), you select a data entity from the list of available data entities.

In general, you can do the following in the Browse data entities panel:

  • Browse the available data entities and select an entity by its name.

  • Filter the available list to search and then select. In the search field, enter a partial or full entity name and press Enter to start the search. Search is case-sensitive. For example, if the available data entities include BANK_US and BANK_EU, enter BANK and then select from the filtered list.

  • Use one or more parameters in the search string. For example: CUSTOMERS_${COUNTRY}

    To use a parameterized data entity name to select the input or output resource, see Using Parameters in Data Entity Names.

Depending on the data asset type of a resource, after selecting a data entity, more configuration might be necessary on the Details tab or Advanced options tab in the Properties panel.

Assigning parameters to resource objects

To keep a resource object such as a data asset or connection in a data flow from being permanently bound to a specific resource, you assign a parameter to that object.

You can assign the parameter after or before you select the resource object.

  • In the data flow, add the source or target operator. After making a selection for a resource type, click Assign parameter next to the resource name to use another panel to select and assign a parameter for the selected object. If a suitable parameter type is not available, you can add a parameter and then assign it.

    Note

    Assign parameter is not available for a data entity whose name includes a parameter (such as BANK_${REGION}). You cannot bind a parameterized entity resource to another parameter.
  • Alternatively, add the source or target operator, then click Assign parameter to use one panel to assign a parameter, and select the resource for the object at the same time. In the Add parameter panel, you select a resource based on the parent-child relation inherited from the preceding selection. The breadcrumb at the top of the panel shows the selection hierarchy.

See also Using Data Flow Parameters.

Using parameters in data entity names

You can include one or more data flow parameters in the name of the data entity resource you specify for a data operator.

The syntax to use for a data flow parameter in a data entity name is ${PARAMETER_NAME}. For example: CUSTOMERS_${COUNTRY}

A parameter name is case-sensitive, and each parameter must have a default value.

For example, CUSTOMERS_${COUNTRY} might return the database table CUSTOMERS_USA, and BANK_${COUNTRY}/* might return the Object Storage files in BANK_EU.

To use parameters in data entity names when you configure a data operator, you can:

  • Add the parameter to the data flow before you add the data operator
  • Add the parameter at the time you're configuring the operator's data entity

How to add a parameter

In a data flow, select Parameters from the View menu on the canvas toolbar to open the Parameters panel.

In the Parameters panel, click Config and then click Add.

In the Add parameter panel, configure a parameter of the appropriate data type, for example VARCHAR or NUMERIC, and add a default value.

During data entity configuration in a data flow, you can search available data entities by entering the parameter name in the Browse data entities panel. In the search field, start typing ${ followed by any character. If the front part of the parameter name matches existing parameters in the data flow, a list of suggested names displays. To select a parameter from the list, click the parameter name, complete the syntax by adding }, then press Enter.

How to add a parameter at the time of data entity configuration

In the Browse data entities panel, you can do the following:

  • From the More actions menu, select Add data flow parameter to use the Add data flow parameter panel. Specify the data type, default value, and other properties for the parameter you want to add and use.

  • In the search field, start typing ${ followed by any character. If the front part of the parameter name matches existing parameters in the data flow, a list of suggested names displays. To select a parameter from the list, click the parameter name, complete the syntax by adding }, then press Enter.

  • In the search field, enter the parameter name for example ${PARAMETER_NAME}). If the parameter name does not yet exist in the data flow, and you press Enter, Data Integration displays the Add data flow parameter panel. Alternatively, after entering the parameter name, select Add data flow parameter from the More actions menu.

    In the Add data flow parameter panel, specify the data type, default value, and other properties for the parameter you want to add and use.

Source Operator

Use the source operator to specify the data entities that serve as the input to the data flow.

Adding and configuring a source operator

You can add multiple source operators on a data flow.

Important

If you're using a hierarchical data entity for a source operator, see Hierarchical Data Types to understand what is supported. You might not be able to perform some configurations on the Attributes tab or Data tab in the Properties panel.
  1. From the Operators panel, drag a Source operator onto the canvas.
  2. With the source operator in focus, on the Details tab of the Properties panel, enter a name for the source operator in the Identifier field, or leave the default name as-is.
  3. To configure the input source for this operator, select the data asset, connection, and schema (or bucket) by clicking Select as it becomes available next to the resource type. See the following descriptions for more information about the data asset type you're selecting for this source operator.
    Data asset type Description

    Object Storage

    Amazon S3

    HDFS

    Click Select next to Bucket to select the compartment, and then select the bucket.

    After selecting the data asset, connection, and bucket, proceed to step 5 to select a data entity.

    Generic REST data asset

    After selecting a connection, the base URL that you entered while creating the data asset is displayed. Follow these steps to configure the source:

    • Click Select next to Resource to select an endpoint.

    • Click Select next to Operation to select an operation for the endpoint.

    • For more properties, see step 7.

    Oracle Fusion Applications using Oracle BI Cloud Connector (BICC)

    For Schema, select the BICC offering. The selected BICC offering provides the BICC view objects (VO) for data extraction.

    After selecting a data asset, connection, and schema, proceed to step 4 to select a data entity.

    Oracle Fusion Applications using Oracle BI Publisher (BIP)

    For Data asset:

    To use a BIP data asset in a data flow, you need an Object Storage bucket as the staging location. If you select a data asset that doesn't have a default staging location specified in that data asset, Data Integration displays a warning notification. When you see the notification, choose one of the following options:
    • Click Update data asset to add a default staging location to that data asset.
    • Click Select next to Staging location in the Properties panel to select the Object Storage data asset that has the bucket you want to use for staging.

    For Schema:

    Select the BIP folder that contains the report as the schema.

    See Extract data from BIP in Data Integration for details on:
    • Preparing the BIP report and its SQL query
    • Configuring chunking
    • Using custom BIP parameters

    After selecting a data asset, connection, and schema, proceed to step 4 to select a data entity.

    Other data assets that are not Object Storage, Amazon S3, or HDFS

    After selecting a data asset, connection, and schema, proceed to step 4 to select a data entity.

  4. To select a Data entity that's not from an Object Storage, Amazon S3, or HDFS data asset:
    1. Note the following in the Select data entity panel. Where applicable, you can:
      • Use the Data entity menu to select a data entity by its name.

      • Click Browse data entities to use another panel to browse or search the available list, and then select a data entity.

        See Selecting a Data Entity, if you need help with searching the list by using a full or partial name.

        You can include one or more data flow parameters in the name of a selected data entity, by using the syntax ${PARAMETER_NAME}. For example: BANK_${REGION}

        See Using Parameters in Data Entity Names, if you need help with adding and using parameters when configuring entity names.

    2. Depending on the data asset type for this source operator, select the data entity and complete more configuration, if applicable.
      Data asset type Description

      Oracle Database

      Oracle Database on Amazon RDS

      Oracle Siebel

      Oracle Peoplesoft

      Oracle E-Business Suite

      Oracle ATP and ADW

      MySQL

      MySQL Heatwave

      MySQL on Amazon RDS

      Amazon RDS for SQL Server

      Microsoft SQL Server

      Microsoft Azure SQL Database

      IBM DB2

      On the Details:

      • Select the data entity by name.

      • If applicable, click Enter custom SQL to enter a single SQL statement in the editor.

        By providing a SQL query statement, you can define a database source entity and the shape of the entity at the same time. For example, you can define and filter the data in a SQL statement instead of adding a filter operator after you define the entity to use for the source operator.

        After entering the SQL query, click Validate to check the statement for errors.

      • If available and applicable, follow these steps to load only new or updated records from the source to the target:

        • Select the Incremental load checkbox to identify and load only the rows that have been created or modified since the last time the load process was run.

        • For Watermark column, select the column that's used to mark the rows that have been incrementally loaded. Only DATE, TIMESTAMP, and DATETIME columns can be used as a watermark column.

      Oracle Fusion Applications using Oracle BI Cloud Connector (BICC)

      On the Details tab: Select a BICC view object (VO) as the data entity from which to extract data. Then use the Extract strategy menu to choose how the source data is extracted.

      • Full: Extracts and loads all data from the optional Initial extract date that you specify. If you don't provide a date in Initial extract date, then all data is extracted.
      • Incremental: Extracts and loads only data that's created or modified after a specific date.
        • Select Managed to use the last date that's tracked and managed by Data Integration. You don't explicitly specify a date.
          Important

          The Managed option fetches an extra 24 hours of data into the past.

          Data Integration adds the parameter SYS.LAST_LOAD_DATE to track the date of the last load or task run, and stores the last successful load date as a watermark across consecutive runs. When the task is run, you're allowed to change the value of the last date that's stored in the system.

        • Select Custom to provide the date yourself in the Last extract date field.

      Next, select a different BICC external configuration storage for the extracted data, if you don't want to use the Data asset default. The Data asset default is the BICC external storage that's configured within the BICC Fusion Applications data asset when the asset was created or edited.

      Optionally complete the following on the Advanced options tab:

      Select Enable BI broker if you want the BICC extractor to read input rows directly from the Fusion Applications database instead of a regular extraction path.
      Important

      • BI broker mode is a performance optimization option for advanced users.
      • BI broker mode is a BICC feature supported only for some data stores. Extraction fails for data stores that aren't supported.
      • With the BI broker option, extract performance might improve because queries are executed directly against the data stores in the underlying database. The performance improvement depends on the data store, current workload on the BI server, current workload on database, and other factors.
      • In some situations, enabling the BI broker mode changes the source column data types.
      If you don't select Enable BI broker, the data is extracted through the BI server. For more information, see Broker Extract Mode in Configure Offerings to Extract of the Creating a Business Intelligence Cloud Extract guide. The BICC guide is part of the Applications Common book list in the Oracle Fusion Cloud Applications Suite documentation.

      For BICC column properties, select the type of columns from the BICC source to include as attributes in the target.

        • All: All columns of the BICC PVO are included. However, columns with names that start with ExtnAttribute or columns that have the same name and label don't appear.

        • Default only: Only the columns enabled for extract by default in the BICC PVO are included. It displays all columns including the Flex columns.

        • Default and primary keys: Default and primary columns of the BICC PVO are included. It displays all columns including the Flex columns.

        • Primary keys only: Only the primary columns of the BICC PVO are included. It displays all columns including the Flex columns.

      Oracle Fusion Applications using Oracle BI Publisher (BIP)

      On the Details tab:

      • For Data Entity, select a report. See the Extract data from Oracle Fusion Applications using BIP in Data Integration blog for a reminder of the report location.
      • Specify the staging location, which is the Object Storage bucket for staging the extracted data. If the selected data asset has a specified default staging location, Data Integration automatically uses the Object Storage data asset and bucket that are specified as the default staging location in that data asset. If the selected data asset doesn't have a specified default staging location, you can select the Object Storage data asset that has the bucket you want to use for staging. Otherwise, you can update the data asset to add a default staging location, or you can select a different data asset.

      Then complete the following on the Advanced options tab:

      • Select Enable row limit to specify the maximum number of rows to retrieve.

      • In the Row limit field, specify the row limit.

      • If you expect huge data volume, specify the size in the Chunk size field under Enable chunking. Chunk size is mandatory if the specified row limit is greater than 100000.

        For help with chunking, see the blog Extract data from Oracle Fusion Applications using BIP in Data Integration.

      • The BIP properties section displays the BIP report parameters from the report that you selected for the data entity. Select a property each in the Number of rows returned and Starting offset fields. You cannot select the same property for both fields.
      • To load only new or updated records from the source to the target:
        • Select the Incremental load checkbox to identify and load only the data that's created or modified since the last time the load process was run.
        • For Watermark column, select the property that's used to mark data that's been incrementally loaded.
    3. After selecting a data entity, the name of the selected data entity is displayed next to Data entity on the Details tab in the Properties panel.
      • If you provided a SQL statement, the label SQL_ENTITY<nnnnnnnnn> is displayed, for example: SQL_ENTITY123456789

      • If you used a parameter in the entity name, the label includes the parameter name, for example: BANK_${REGION}

        Note

        For a data entity that includes a parameter in the name, you cannot bind the entity resource to another parameter. The Assign parameter option is unavailable.

  5. To select a Data entity that's from an Object Storage, Amazon S3, or HDFS data asset:
    1. Use the Select data entity panel to select a single entity by name, or select one or more entities by using a pattern expression.
      • Click Browse by name to use another panel to browse or search the available list, and then select one data entity by name.

        See Selecting a Data Entity, if you need help with searching the list by using a full or partial name.

        You can include one or more data flow parameters in the name of a selected data entity, by using the syntax ${PARAMETER_NAME}. For example: BANK_${REGION}.csv

        See Using Parameters in Data Entity Names, if you need help with adding and using parameters when configuring a data entity by name.

      • Click Browse by pattern to use a regular expression (including wildcards) to select one or more entities that match a specific directory and file pattern. For example, enter CUSTOMERS* in the field and press Enter to filter the list. See Testing and Using a Pattern.

        You can also include parameters in the pattern expression, by using the syntax ${PARAMETER_NAME}. For example: CUSTOMERS_${REGION}/*

        See Using Parameters in Data Entity Names, if you need help with adding and using parameters when configuring data entity names by pattern.

        Note

        When you use a pattern expression, all existing files that match the pattern are assumed to have the same structure. The matched files are treated as a single entity in the data flow. Any future new files that match the pattern are also processed.

        For a data entity that includes a parameter in the name, you cannot bind the entity resource to another parameter. The option to assign a parameter is unavailable.

    2. After making the data entity selection by name or by pattern, complete the entity configuration in the Select data entity panel.

      How you complete the configuration depends on the File type you choose for the selected data entity.

      File type Description

      CSV

      If you used Browse by name to select the data entity, you can click Preview raw data to display the first 10 lines of the file.

      Select the Compression type (compression method) that's used. If you don't know the compression method that's used to compress the file, select Auto (Default).

      The default Encoding is UTF-8, which cannot be changed.

      Other settings you can configure are:

      • If the first row in the file is a header row, select Yes for Has header.
      • If the values in the data rows span multiple lines, select Yes for Multi-line.
      • Specify the Escape character that escapes other characters found in data values. For example: \
      • Select the Delimiter character that separates data fields. For example: COLON (:), COMMA (,), PIPE (|), SEMICOLON (;), or TAB (\t)
      • If a column delimiter is included at the end of a data row in the file, select Yes for Trailing delimiter.
      • Specify the Quote character that treats other characters as literal characters. For example: "

      JSON

      Select the Compression type (compression method) that's used. If you don't know the compression method that's used to compress the file, select Auto (Default).

      The default Encoding is UTF-8, which cannot be changed.

      Select Use custom schema to paste or upload a custom sample schema that's used to infer the entity shape. When this checkbox is selected, schema drift is no longer applicable in the source entity.

      • If you select Upload, drop a custom schema file in the box provided, or click Select a file to select the schema file to upload.

      • If you select Paste in schema, copy the schema text file content and paste it in the box provided.

      PARQUET

      The default Compression type is Auto, which cannot be changed.

      AVRO

      The default Compression type is Auto, which cannot be changed.

      EXCEL

      Only XLSX files are supported.

      The default Compression type is Auto, which cannot be changed.

      By default, Data Integration treats the first row in a file as a header row. If the first row in a file is not a header row, select No for Has header.

      For Select entity by, you can choose the criteria as Sheet name, Sheet index, or Table name. Then enter a Value for the worksheet name, worksheet index, or table name. Sheet index is zero-based.

      For Sheet name or Sheet index, enter the area of the file to use as the Data range for selection. If you don't enter a data range value, the default is the data range A1, which corresponds to the entire sheet. If the file has a header row, enter a value that starts from the header row, for example, A1:K56.

    3. After completing the data entity configuration, click Select.

      You're brought back to the Details tab in the Properties panel. The name of the selected data entity or entity pattern expression is displayed next to Data entity. For example, CUSTOMERS_EU.csv, CUSTOMERS*, or CUSTOMERS_${REGION}/*

      Note

      For a data entity that includes a parameter in the name, you cannot bind the entity resource to another parameter. The Assign parameter option is unavailable.

    4. If available and applicable, select the Incremental load checkbox to identify and load only the data that's created or modified since the last time the load process was run successfully.

      For file data assets such as Object Storage, Data Integration performs incremental extraction based on the last modified date (timestamp) of the objects that are selected in a folder or selected by a file pattern.

      For example, suppose on January 5 there are three files in the Object Storage bucket: EMPLOYEE_1.csv, EMPLOYEE_2.csv, EMPLOYEE_3.csv

      If a load job is successfully run on January 5, all three files are processed.

      • On January 6, a new file EMPLOYEE_4.csv is added to the bucket; this file has the last modified date (timestamp) of January 6. If a load job is run on January 6, only EMPLOYEE_4.csv is picked up and processed for incremental data loading.
      • On January 6, a new file EMPLOYEE_4.csv is added to the bucket; this file has the last modified date (timestamp) of January 6. The file EMPLOYEE_2.csv is also modified on January 6. If a load job is run on January 6, the files EMPLOYEE_2.csv and EMPLOYEE_4.csv are picked up for processing.
  6. (Optional) On the Details tab, select the refresh icon next to the selected entity name to ensure that you're using the entity's latest schemas during the current working session in the designer. Only the metadata of the entity is fetched, not the actual data changes.
  7. (Optional) On the Advanced options tab, depending on the data asset type for this source operator, you can select or clear applicable options.

    Item Description

    Fetch file metadata as attributes

    Select the checkbox to include file metadata (such as file name and file size) as attributes in the returned data.

    Allow schema drift

    By default, schema drift is enabled to let Data Integration dynamically handle schema definition changes during data flow design time and runtime. Schema drift is changes to schema definitions in the specified data entity. Changes include attributes that are added or removed, and attribute data types or entity names that are changed.

    If you clear the Allow schema drift checkbox, you disable schema drift to lock the schema definitions when the data flow is defined. When schema drift is disabled, Data Integration uses a fixed shape of the specified data entity even when the underlying shape has changed.

    If a custom schema is selected, schema drift isn't applicable and this option can't be selected.

    Treat missing file as empty

    Only for an Object Storage data asset.

    This checkbox is available only when Allow schema drift is not selected.

    Allow pushdown

    By default, pushdown is enabled to let Data Integration use the source database system to process transformation operations, if the system supports pushdown. Only Oracle databases, Oracle Autonomous Data Warehouse, Oracle Autonomous Transaction Processing, and MySQL support pushdown. Less data is pulled from the source when Data Integration uses the source database system to process operations.

    By clearing the Allow pushdown checkbox, you disable pushdown. When pushdown is disabled, Data Integration pulls all data directly from the source system and then processes the transformations.

    Parameter and Header settings

    Only for a REST data asset.

    Use the Parameter and Header settings to configure more source properties.

  8. In the Attributes tab, view the attributes from the data entity you selected. For a BICC Fusion Applications data asset, the attributes are the BICC columns from the VO you selected.

    You can select individual attributes or use the filter icons to filter the list of attributes. Then apply bulk exclude rules to the selected or filtered attributes.

    • Click the filter icon in the Name column to filter attributes by a name pattern. In the filter field, enter a simple regular expression. You can use wildcards ? and * in the regex pattern.
    • Click the filter icon in the Type column to filter attributes by a data type. Use the menu to select the type to use as a filter.
      Note

      You can apply only one name pattern filter, but multiple type filters at a time. For example, to filter by the name pattern *_CODE and by the type numeric or varchar, you apply one name pattern filter (*_CODE) and two type filters (numeric, varchar).
    • Use the Actions menu to apply bulk exclude rules to the selected attributes or filtered attributes. Select Exclude by selection or Exclude by applied filter.
    • Click View rules to open the Rules panel. You can view and manage the rules that have been applied to the data entity. By default, the first rule in the Rules panel includes everything.

    A complex type is shown as ARRAY (data type), COMPOSITE, or MAP (key type, value type). See Hierarchical Data Types to understand what is supported for complex types.

  9. In the Data tab, view a sampling of the data, based on the configuration of the operator and any rules you have applied in the Attributes tab.

    For a BICC Fusion Applications data asset, only 10 rows of data are shown.

    The Data Profile does not display for complex data attributes. For complex data attributes, to view the hierarchy of the data structure in a simplified structure, click the complex data shown. For example, [...] or {…}.

    You can apply transformations to individual attributes or do a bulk transformation on a group of attributes.

    For entities with hierarchical data types, see Hierarchical Data Types to understand what is supported.

  10. In the Validation tab, check for warnings or errors that might cause the data flow to fail.
Using file patterns

When configuring OCI Object Storage as a data source, you can use a regular expression to specify a file pattern for selecting one or more data entities.

A file pattern is a rule for finding files that match a directory and file name and how to handle the matched files when found.

Syntax to use

Data Integration supports the glob pattern syntax for specifying a file pattern.

  • An asterisk, *, matches any number of characters (including none).
  • Two asterisks, **, works the same as * but crosses directory boundaries to match complete paths.
  • A question mark, ?, matches exactly one character.
  • Braces specify a collection of sub patterns. For example:
    • {sun,moon,stars} matches "sun", "moon", or "stars".
    • {temp*,tmp*} matches all strings beginning with "temp" or "tmp".
  • Square brackets convey a set of single characters or, when the hyphen character (-) is used, a range of characters. For example:
    • [aeiou] matches any lowercase vowel.
    • [0-9] matches any digit.
    • [A-Z] matches any uppercase letter.
    • [a-z,A-Z] matches any uppercase or lowercase letter.

    Within the square brackets, *, ?, and \ match themselves.

  • All other characters match themselves.
  • To match *, ?, or the other special characters, you can escape them by using the backslash character, \. For example: \\ matches a single backslash, and \? matches the question mark.
Examples
*.htmlMatches all strings that end in .html
???Matches all strings with exactly three letters or digits
*[0-9]*Matches all strings containing a numeric value
*.{htm,html,pdf}Matches any string ending with .htm, .html, or .pdf
a?*.javaMatches any string beginning with a, followed by at least one letter or digit, and ending with .java
{foo*,*[0-9]*}Matches any string beginning with foo or any string containing a numeric value
directory1/20200209/part-*[0-9]*jsonMatches all files in the folder where the file name starts with part- and have any number of 0-9 numbers and ends with json
directory3/**.csvMatches all files with an extension of csv that are in the folder directory3 and its subfolders
directory3/*.csvMatches all files with an extension of csv that are only in the main folder directory3. Files in subfolders are not included.
Testing and using a pattern

You can test the expression to ensure that the pattern you want to use retrieves the Object Storage files for one or more data entities.

  1. In the Select data entity panel, click Browse by pattern.

  2. In the Browse data entities by pattern panel, select Test pattern from the More actions menu.

  3. In the Test pattern panel, in the Search pattern field, enter a pattern expression you want to test before using.

    For example, enter department1/2020/*.json to find all files with the extension .json that are in the directory department1/2020. You can use parameter syntax ${} in the Search pattern field.

  4. To test the search pattern, in the Test filenames block, provide a file name or multiple file names delimited by a new line. For example, for the pattern BANK_C*/*, the file names might be:

    BANK_CITY/part-00002-0aaf87d57fbc-c000.csv	
    BANK_COUNTRY/part-00000-a66df3ab02fd-c000.csv
  5. Click Test pattern.

    Verify that the test file names are returned in the Resulting file name block.

  6. Click Use pattern to add the pattern expression to the Browse data entities by pattern panel.

    You're returned to the Browse data entities by pattern panel. The files that match the pattern expression are displayed in the table.

  7. Click Select pattern.

    You're returned to the Select data entity panel. The pattern expression is displayed next to Data entity.

    When you use a pattern expression, all existing files that match the pattern are assumed to have the same structure. The matched files are treated as a single entity in the data flow. Any future new files that match the pattern are also processed.

Using Data Integration managed incremental load (BICC)

Incremental load is loading only new or updated data from a source into a target. In Data Integration, when you configure BICC Oracle Fusion Applications as a source data, you can use the Managed incremental extract strategy to do incremental load.

When you choose to use the incremental extract strategy, only new or updated records from the source are extracted based on a last extract date. Data Integration provides two last extract date options:

  • Custom: You provide a last extract date for every task run.

  • Managed: Data Integration manages the date for you by tracking the task run timestamp and storing the last successful load date as a watermark across consecutive runs.

    With the Data Integration managed last extract date option, you don't need to explicitly specify a date for a task run. You can, however, override the date at runtime.

Task runs initiated by a task schedule

Data Integration tracks a task run independently of a task run that's initiated by a task schedule. So if you use the Managed last extract date option and also set up a task schedule, Data Integration automatically keeps track of the last successful task run timestamp of task-scheduled runs separately from the last successful task run timestamp of task runs that are not initiated by a task schedule. This means the last date that's managed within a task schedule or the last date that's managed within a task is not modified by the other run operation.

Incremental load for different data entities in a data flow

Suppose you want to set up incremental load on different data entities in a data flow. You can achieve this by using parameters and creating a task schedule for each data entity. The general steps are:

  1. In the data flow, assign parameters for the source schema (BICC offering) and data entity (BICC VO).
  2. Create and publish an integration task for the parameterized data flow.
  3. Create a task schedule for the integration task. On the Configure parameters page, specify the schema and data entity values for the task schedule.
  4. Create another task schedule for the same integration task. On the Configure parameters page, set the schema and data entity values for this task schedule.
Incremental load multiple times in one day

To perform an incremental load more than one time in a day, add a filter operator immediately after the BICC source operator in the data flow. Then create a condition expression to filter out data that has already been processed. For example, if the BICC VO's last_update_date column is LastUpdateDateField, then the expression might be the following:

FILTER_1.MYSOURCE_1.LastUpdateDateField > ${SYS.LAST_LOAD_DATE}

Target Operator

Use the target operator to specify the data entities that serve as the output for storing transformed data.

Adding and configuring a target operator

You can add multiple target operators to a data flow. Each target can have only one inbound port.

Important

If you're using a hierarchical data entity for a target operator, see Hierarchical Data Types to understand what is supported. You might not be able to perform some configurations on the Attributes, Map, and Data tabs in the Properties panel.
  1. From the Operators panel, drag a Target operator onto the canvas.
  2. With the target operator in focus, do the following on the Details tab of the Properties panel:

    Item Description

    Identifier

    Enter a name for the target operator, or leave the default name as-is.

    Create new data entity

    (Optional) Select the checkbox to create a new entity on the target, instead of updating an existing entity on the target. In a later step, you are prompted to enter a name for the data entity that's to be created on the target.

    Do not select the checkbox if you want to use an existing data entity as the target. You are prompted to select the data entity after you select the target data asset, connection, and schema.

    Data asset

    Connection

    Schema or Bucket

    In the resource order as presented, click Select next to the resource to select the data asset, connection, and schema or bucket.

    Take note of the following:

    • Not all data source types are available as target data assets. For example, PostgreSQL and Hive are not available as targets.

    • For the Merge Integration strategy, you cannot use an Object Storage data asset as the target.

    • If you're using an Object Storage data asset, select the compartment first and then select the bucket.

    • To use Autonomous Data Warehouse or Autonomous Transaction Processing as a target, you need an Object Storage bucket as the staging location. If you select a data asset that doesn't have a default staging location specified in that data asset, Data Integration displays a warning notification. When you see the notification, choose one of the following options:
      • Click Update data asset to add a default staging location to that data asset.
      • Click Select next to Staging location in the Properties panel to select the Object Storage data asset that has the bucket you want to use for staging.
      The PAR_MANAGE request permission is required. For more information, see step 5.

    Data entity

    If you selected the Create new data entity checkbox, enter a name for the entity that's to be created as the target. For more information, see step 3.

    If you did not select the Create new data entity checkbox, select an existing data entity as the target. For more information, see step 4.

    (Only for data assets of type Autonomous Data Warehouse or Autonomous Transaction Processing)

    Staging location

    Data Integration uses the staging location to stage data before loading the data into the target.

    For more information, see step 5.

    Integration strategy

    How data is moved into the target data entity depends on the integration strategy you choose. The options are:

    • Insert: Inserts new records, or appends the records when the data exists on the target.

      If the Create new data entity checkbox is selected, the only strategy option that's available for selection is Insert.

    • Overwrite: Performs a truncate on the target before inserting new records.

      If you select the Overwrite strategy, you cannot create a new data entity for the target entity.

    • Merge: Inserts new records and merges existing records into an existing data entity.

      Take note of the following when using the Merge strategy:

      • The Merge strategy is available for database targets only. You cannot use an Object Storage data asset as the target if you want to use this integration strategy.

      • The Merge key identifies the attribute or attributes to determine which existing rows or new rows to update or insert. You can select a primary/unique key or a non-primary key as the Merge key.

        Click Select or Edit. In the Merge key panel, do one of the following actions:

        • Select Primary/Unique key attributes and select a key.
        • Select Other attributes to use other (non-primary) attributes. Enter a name for the merge key and select the attributes from the list.
      • If a parameterized target data entity is configured to use the Merge strategy, you can change the Merge key selection:
        • In an integration task at design time and runtime
        • In an integration task in a pipeline at design time, and at runtime (pipeline task)
        • In a task schedule (integration task and pipeline task)

    (For Object Storage only)

    Create output as a single file

    This step is optional for Object Storage targets.

    Select the Create output as a single file checkbox to write the output to a single file, otherwise the operation creates multiple files. The single output file is overwritten every time the task is run.

    Creating a single output file might affect the performance of Data Integration. Do not use the single output file option for large datasets.

    You can choose to write the output to a single file only if one of the following conditions applies:

    • When the Integration strategy is Insert, and the Create new data entity checkbox is selected. Do not add the forward slash (/) at the end of the new entity name. For more information, see step 3.
    • When the Integration strategy is Overwrite, and you use an existing data entity (Create new data entity checkbox is not selected). Select an existing data entity that does not have the forward slash (/) at the end of the entity name. For more information, see step 4.
  3. If you selected the Create new data entity checkbox: Enter a name for the entity that's to be created. A new data entity is created only if the name does not exist.

    See also Parameters in Data Entity Names if you want to use a parameter for part of the new entity name.

    Depending on the data asset type, take note of the following when you enter a name for the data entity:

    Target type Description

    For Oracle Database, Autonomous Data Warehouse, or Autonomous Transaction Processing

    Enter the new data entity name.

    When creating a database data entity as a target output, if the entity name you provide does not exist, a data entity is created and data is inserted into the output. If the entity name you provide exists, the outcome of the operation depends on the shape of the target on the data flow and the shape of the existing entity.

    A shape is based on the number and name of attributes, data types, type characteristics such as length and precision, and whether data is present. For example, data type varchar2(4000) is not the same shape as varchar2(2000) even though the attribute name on the data flow target is the same as the name of the attribute on the existing data entity.

    • If the shape of the target entity on the data flow is the same as the shape of the existing entity, the database operation:
      • Inserts data, if no data is present
      • Updates or inserts data, if data is present
    • If the shape of the target entity on the data flow is not the same as the shape of the existing entity:
      • The operation adds the attribute to the existing entity when the attribute name does not exist.

        For example, the target on the data flow has the attribute EMPLOYEE_ID NUMBER(10), but EMPLOYEE_ID NUMBER is not present in the existing entity.

      • When the attribute name is present in the existing entity, but the data type shape is not the same:
        • When attribute data is present, the database operation fails when the data shape is smaller. For example, when the target on the data flow has EMPLOYEE_COMMENTS VARCHAR2(4000) and the existing entity has EMPLOYEE_COMMENTS VARCHAR2(2000), the operation fails. When the target on the data flow has EMPLOYEE_COMMENTS VARCHAR2(2000) and the existing entity has EMPLOYEE_COMMENTS VARCHAR2(4000), the operation succeeds.

        • When attribute data is not present, the operation modifies the attribute and inserts data.

          If the target is an autonomous database, you might encounter the database error ORA-22859: invalid modification of columns when an attempt is made to modify a column object, REF, VARRAY, nested table, or LOB column type. Create a column of the preferred type and copy the column data to the new type using the appropriate type constructor.

    Note also that conversion between TEXT and CLOB is not supported in Data Integration. With the limit of 4000 bytes for a VARCHAR2 column in an autonomous database, you might have to use an existing target data entity that has a CLOB data type instead of selecting the Create new data entity checkbox in certain use cases.

    For Object Storage

    Enter the new data entity name, followed by a forward slash (/). For example, enter newdirectory/.

    However, if you select the Create output as a single file checkbox because you want to create a single file output, enter the new entity name without the forward slash (/) at the end.

    When writing output to a single file, you can include parameters in the file name using the syntax ${PARAMETER_NAME}. For example, the file name can be written as:

    /bucket1/file_${REGION}.csv
    /bucket1/file_${CURRENT_DATE}.csv

    Also select the File type and Compression type. The compression types that are available for selection depends on the file type you choose. If you don't know the compression method that's used to compress the file, select None (Default).

    For CSV and JSON file types, the default Encoding is UTF-8, which cannot be changed.

    If you choose the file type CSV, other settings you can configure are:

    • If the first row in the file is a header row, select Yes for Has header.
    • If the values in the data rows span multiple lines, select Yes for Multi-line.
    • Specify the Escape character that escapes other characters found in data values. For example: \
    • Select the Delimiter character that separates data fields. For example: COLON (:), COMMA (,), PIPE (|), SEMICOLON (;), or TAB (\t)
    • If a column delimiter is included at the end of a data row in the file, select Yes for Trailing delimiter.
    • Specify the Quote chracter that treats other characters as literal characters. For example: "
  4. If you did not select the Create new data entity checkbox: Select an existing data entity.

    To ensure that you're using the entity's latest schemas during the current working session in the designer, select the refresh icon next to the entity name. Only the metadata of the entity is fetched, not the actual data changes.

    Depending on the data asset type, take note of the following when configuring to use an existing data entity:

    Target type Description

    For Object Storage

    Select the Create output as a single file checkbox to use a single output file, otherwise the operation creates multiple files. The single output file is overwritten every time the task is run.

    Creating a single output file might affect the performance of Data Integration. Do not use the single output file option for large datasets.

    If the Create output as a single file checkbox is selected: When selecting the data entity, select an entity name that does not have the forward slash (/) at the end.

    Select the File type. If CSV is the file type, you can click Preview raw data to display the first 10 lines of the file.

    Select the Compression type. The compression types that are available for selection depends on the file type you choose. If you don't know the compression method that's used to compress the file, select None (Default).

    For CSV and JSON file types, the default Encoding is UTF-8, which cannot be changed.

    If the file type is CSV, other settings that you can configure are:

    • If the first row in the file is a header row, select Yes for Has header.
    • If the values in the data rows span multiple lines, select Yes for Multi-line.
    • Specify the Escape character that escapes other characters found in data values. For example: \
    • Select the Delimiter character that separates data fields. For example: COLON (:), COMMA (,), PIPE (|), SEMICOLON (;), or TAB (\t)
    • If a column delimiter is included at the end of a data row in the file, select Yes for Trailing delimiter.
    • Specify the Quote character that treats other characters as literal characters. For example: "

    If the file type is JSON, select Use custom schema to paste or upload a custom sample schema that's used to infer the entity shape.

    • If you select Upload, drop a custom schema file in the box provided, or click Select a file to select the schema file to upload.

    • If you select Paste in schema, copy the schema text file content and paste it in the box provided.

  5. For Autonomous Data Warehouse or Autonomous Transaction Processing, select the staging location if a default location is not defined for the asset, or edit the location to use a different staging location.
    Note

    If the target is Autonomous Data Warehouse or Autonomous Transaction Processing, the staging location option is enabled for you to select an Object Storage bucket. Data Integration uses the staging location to stage data before loading the data into the target.

    If the target data asset is configured with a default staging location, Data Integration automatically uses that bucket for staging unless you explicitly edit the value to specify a different staging location.

    To use Autonomous Data Warehouse or Autonomous Transaction Processing as the target, you must first enable the PAR_MANAGE request permission on the staging bucket, for example:

    allow any-user to manage buckets in compartment <compartment-name> where ALL {request.principal.type='disworkspace', request.permission='PAR_MANAGE', request.principal.id='<workspace-ocid>'}

    Learn more about Using Pre-Authenticated Requests.

    1. For Staging location, click Select or Edit.
    2. (For edit) In the Select staging location panel, do not select the Use default staging location settings checkbox, if you want to select a different staging location.
    3. Click the Data asset menu and select an Object Storage data asset.
    4. Then select the Connection, Compartment, and the Bucket, and click Select.
  6. (Optional) On the Advanced options tab, depending on the data asset type for this target operator, you can enter values or use default values.

    Item Description

    Reject limit

    For Autonomous Data Warehouse or Autonomous Transaction Processing

    Enter a value.

    You can specify the maximum number of erroneous rows that can fail to load into the target before the integration task fails. For example, if the data source has 1,000 rows and you set the reject limit at 200, the task fails immediately after the 200th erroneous row is rejected.

    If you don't specify a value, the default is zero, which means the task fails upon the first erroneous row being rejected.

    If you encounter a task failure, check the logs for the table names where the rows had been rejected. Then query the affected tables and rows in the autonomous database.

    Load order

    Enter a value.

    Only positive integers are allowed. The default is 0. The load order specifies the order of the target nodes that data is loaded into. The target with value 0 is loaded first, then target with value 1, and so on. If multiple nodes have the same value, the targets are loaded in no particular order.

  7. Under the Attributes tab, you can select individual attributes or use the filter icons to filter the list of attributes by a name pattern or by an attribute type. Then apply bulk exclude rules to the selected or filtered attributes.

    You can apply only one name pattern filter, but multiple type filters at a time. For example, to filter by the name pattern *_CODE and by the type numeric or varchar, you apply one name pattern filter (*_CODE) and two type filters (numeric, varchar).

    • Click the filter icon in the Name column to filter attributes by a name pattern. In the filter field, enter a simple regular expression. You can use wildcards ? and * in the regex pattern.
    • Click the filter icon in the Type column to filter attributes by a data type. Use the menu to select the type to use as a filter.
    • Use the Actions menu to apply bulk exclude rules to the selected attributes or filtered attributes. Select Exclude by selection or Exclude by applied filter.
    • Click View rules to open the Rules panel. You can view and manage the rules that have been applied to the data entity. By default, the first rule in the Rules panel includes everything.

    A complex type is shown as ARRAY (data type), COMPOSITE, or MAP (key type, value type). See Hierarchical Data Types to understand what is supported for complex types.

  8. In the Data tab, view a sampling of the data, based on any rules you have applied in the Attributes tab. Transformations cannot be applied on a target operator as the data is read-only.

    For entities with hierarchical data types, see Hierarchical Data Types to understand what is supported.

  9. In the Map tab, specify how the incoming attributes map to the target data entity's attributes. Note that mapping attributes is not applicable if you selected the Create new data entity checkbox.

    For target entities with hierarchical data types, only first-level fields can be mapped. See Hierarchical Data Types to understand what is supported.

  10. In the Validation tab, check for warnings or errors that might cause the data flow to fail.

    Data Integration checks and provides warnings in the following situations:

    • Data truncation might occur when data is loaded from the source attributes to the mapped target attributes.
    • The data type of a source attribute does not match the data type of the mapped attribute.
Viewing data

The Data tab displays the transformed data based on the operators applied in the data flow.

Important

If you're using a hierarchical data entity for a target operator, see Hierarchical Data Types to understand what is supported.

You can filter data in the target entity by a name pattern or data type. To filter data by a name pattern, enter a simple regex pattern or wildcards ? and * in the Filter by Pattern field. To filter data by a type, select the data type from the menu next to the pattern field.

Transformations cannot be applied to the target operator as the data is read-only.

Mapping attributes

The Map tab is only available for a target operator.

Important

If you're using a hierarchical data entity for a target operator, see Hierarchical Data Types to understand what is supported.

When you're creating a new target data entity, the Map tab is unavailable. The incoming attributes are used to create the table or file structure with a 1-to-1 mapping.

When using an existing target data entity, map the incoming attributes to the target data entity's attributes. The actions you can perform are:

Map by Position

Maps incoming attributes to target entity attributes according to their position in the lists.

From the Actions menu, select Auto-map by position. The Auto-map by position rule is added.

Map by Name

Maps incoming attributes to target attributes with the same name.

From the Actions menu, select Auto-map by name. The Auto-map by name rule is added.

Map by Pattern

Maps incoming attributes to target attributes based on simple, user-defined regex rules.

From the Actions menu, select Map by pattern. Enter a source pattern and a target pattern. Then click Preview mapping to test the source and target patterns.

To define a pattern, you can use asterisk (*) and question mark (?) symbols. Use an asterisk to indicate a wildcard of any number of characters in a string pattern. Use a question mark to indicate a single character wildcard. For example, *INPUT? maps any matching attribute starting with n number of characters containing the string INPUT followed by a single character, such as NEWINPUTS.

By default, pattern matching is case-insensitive. For example, the source pattern *Name matches the target name CUSTOMER_NAME and Customer_Name.

To indicate different capture groups, use $n. For example, let's say you want to map LAST_NAME, FIRST_NAME, and USERNAME from a source or upstream operator to TGT_LAST_NAME, TGT_FIRST_NAME, and TGT_USERNAME in the target data entity. You would enter *NAME in the Source pattern field and TGT_$1 in the Target pattern field. The asterisk (*) in *NAME means the character string before NAME is to be identical to the character string found in $1 of the target pattern. $1 refers to the first capture group in the source pattern, which in this case is the asterisk (*).

In cases where you need case-sensitive pattern matching, add the (?c) prefix to the source pattern. For example, let's say you want to map the source attribute CustomerName, which uses camel case lettering in its name, to the target attribute with the name Customer_Name. You would enter (?c)([A-Z][a-z]+)([A-Z][a-z]+) as the source pattern and $1_$2 as the target pattern. When the (?c) prefix is added to the beginning of a source pattern, case-sensitive pattern matching is switched on for the mapping. Data Integration detects that the "N" in CustomerName is the start of a new pattern and thus treats ([A-Z][a-z]+)([A-Z][a-z]+) as two different words (capture groups) when matching.

Direct Manual Map

Drag an incoming attribute from the source list to an attribute in the target list to create a mapping.

Alternatively, you can select Manual map from the Actions menu. Then use the Map Attribute dialog to create a mapping by selecting a source attribute and a target attribute.

Remove a Mapping

Removes the selected mapping.

Select View rules. In the Rules panel, select one or more rules and click Remove. Alternatively, you can select Remove from the rule's Actions menu (Actions menu) to clear that rule.

Clear All Mappings

Removes all mappings.

From the Actions menu, select Reset mappings. All manual and auto-map rules are removed.

Shaping Operators

Important

For complex data types, see Hierarchical Data Types to understand what is supported. You might not be able to perform some configurations on the Attributes tab and Data tab in the Properties panel.

Filter Operator

Use the filter operator to select a subset of data from the inbound port to continue downstream to the outbound port based on a filter condition.

Adding and configuring a filter operator
  1. From the Operators panel, drag a Filter operator onto the canvas.
  2. With the filter operator in focus, on the Details tab of the Properties panel, enter a name in the Identifier field, or leave the name as-is.
  3. Next to Filter condition, click Create.
  4. In the Create Filter Condition panel, Condition Builder section, you can double-click or drag attributes, parameters, and functions to add to the editor to build a condition. You can also enter a condition expression manually into the editor, and validate the expression.
    Note

    In the editor, an added element such as a function might have placeholders. To replace a placeholder with another element, highlight the placeholder, and then double-click another element from the list.
  5. Click Create.
  6. (Optional) Click Assign parameter to use parameters so that the filter condition is not bound to the compiled code when you publish the data flow. See Assigning a Parameter.
  7. Under the Attributes tab, select to view incoming attributes or output attributes. Incoming attributes link to the operator on the left side. Output attributes link to the operator on the right side and go into the next operator.

    You can select individual attributes or use the filter icons to filter the list of attributes. Then apply bulk exclude rules to the selected or filtered attributes.

    • Click the filter icon in the Name column to filter attributes by a name pattern. In the filter field, enter a simple regular expression. You can use wildcards ? and * in the regex pattern.
    • Click the filter icon in the Type column to filter attributes by a data type. Use the menu to select the type to use as a filter.
      Note

      You can apply only one name pattern filter, but multiple type filters at a time. For example, to filter by the name pattern *_CODE and by the type numeric or varchar, you apply one name pattern filter (*_CODE) and two type filters (numeric, varchar).
    • Use the Actions menu to apply bulk exclude rules to the selected attributes or filtered attributes. Select Exclude by selection or Exclude by applied filter.
    • Click View rules to open the Rules panel. You can view and manage the rules that have been applied to the data entity. By default, the first rule in the Rules panel includes everything.

    A complex type is shown as ARRAY (data type), COMPOSITE, or MAP (key type, value type). See Hierarchical Data Types to understand what is supported for complex types.

  8. In the Data tab, view a sampling of the data, based on the configuration of the operator and any rules you have applied in the Attributes tab.

    The Data Profile does not appear for complex data attributes. For complex data attributes, to view the hierarchy of the data structure in a simplified structure, click the complex data shown. For example, [...] or {…}.

    You can apply transformations to individual attributes or do a bulk transformation on a group of attributes.

    For entities with hierarchical data types, see Hierarchical Data Types to understand what is supported.

  9. In the Validation tab, check for warnings or errors that might cause the data flow to fail.
Creating a filter condition

Use the Condition Builder to visually select elements to create a filter condition. You can also enter a filter condition manually in the editor.

Creating a filter condition lets you select a subset of data from an upstream operator based on the condition.

The elements that you can use in a filter condition include incoming attributes, parameters, and functions. You can double-click or drag an element from the list to add to the editor to build a condition. You can validate the condition before creating it.

Incoming displays the attributes from the upstream operator that are coming into this filter operator.

For example, to filter data by a city name, you can create the condition expression as:

FILTER_1.ADDRESSES.CITY='Redwood Shores'

Parameters are the expression parameters that have been added to the data flow using the Condition Builder (filter, join, lookup, and split operators) or Expression Builder (expression and aggregate operators). An expression parameter has a name, type, and default value. See Adding an Expression Parameter.

Suppose you want to use a parameter for the city name in the filter condition. You can create a VARCHAR parameter with the name P_VARCHAR_CITY, and set the default value to Redwood Shores. Then you can create the filter expression as:

FILTER_1.ADDRESSES.CITY=$P_VARCHAR_CITY

Functions are the functions available in Data Integration that you can use in a condition. Functions are operations performed on arguments passed to the function. Functions calculate, manipulate, or extract data values from arguments.

For example, to filter data by a city name or by population, you can use the OR function to create the filter condition expression as:

FILTER_1.COUNTRIES.CITY=$P_VARCHAR_CITY OR FILTER_1.COUNTRIES.POPULATION>100000000
FILTER_1.COUNTRIES.CITY=$P_VARCHAR_CITY OR FILTER_1.COUNTRIES.POPULATION>$P_NUMERIC

Here's a list of functions that are available for you to add when you construct conditions:

Hash Functions
FunctionDescriptionExample
MD5(all data types)Calculates an MD5 checksum of the data type, and returns a string value. MD5(column_name)
SHA1(all data types)Calculates a SHA-1 hash value of the data type, and returns a string value. SHA1(column_name)
SHA2(all data types, bitLength)Calculates a SHA-2 hash value of the data type, and returns a string value. bitLength is an integer. SHA2 (column_name, bitLength can be set to 0 (equivalent to 256), 256, 384, or 512).
ORA_HASH(expr, [max_bucket], [seed_value])

Computes a hash value for expr, and returns a NUMBER value.

expr can be an expression, a column, a literal.

max_bucket is the maximum bucket value returned, between 0 and 4294967295 (default).

seed_value is a value between 0 (default) and 4294967295.

Oracle applies the hash function to the combination of expr and seed_value to produce many different results for the same set of data.

ORA_HASH('1')

ORA_HASH('b', 2)

ORA_HASH(100, 10, 10)

ORA_HASH(EXPRESSION_1.CUSTOMERS.SSN, 2)

Arithmetic Functions
FunctionDescriptionExample
ABS(numeric)Returns the absolute power of the numeric value.ABS(-1)
CEIL(numeric)Returns the smallest integer not greater than the numeric valueCEIL(-1,2)
FLOOR(numeric)Returns the largest integer not greater than the numeric value.FLOOR(-1,2)
MOD(numeric1, numeric2)Returns the remainder after numeric1 is divided by numeric2.MOD(8,2)
POWER(numeric1, numeric2)Raises numeric1 to the power of numeric2.POWER(2,3)
ROUND(numeric1, numeric2)Returns numeric1 rounded to numeric2 decimal places.ROUND(2.5,0)
TRUNC(numeric1, numeric2)Returns numeric1 truncated to numeric2 decimal places.TRUNC(2.5,0)
TO_NUMBER(expr[, format, locale])Converts an expr to a number, based on the format and optional locale provided. Default locale is en-US. Supported language tags.

Supported format patterns:

  • 0: A digit
  • #: A digit, zero shows as absent
  • .: Placeholder for decimal separator
  • ,: Placeholder for grouping separator
  • E: Separates mantissa and exponent for exponential formats
  • -: Default negative prefix
  • ¤: Currency sign; replaced by currency symbol; if doubled, replaced by international currency symbol; if present in a pattern, the monetary decimal separator is used instead of the decimal separator

TO_NUMBER('5467.12') returns returns 5467.12

TO_NUMBER('-USD45,677.7', '¤¤##,###.#', 'en-US') returns -45677.7

Date and Time Functions
Function Description Example
CURRENT_DATE Returns the current date. CURRENT_DATEreturns today's date such as 2023-05-26
CURRENT_TIMESTAMP Returns the current date and time for the session time zone. CURRENT_TIMESTAMP returns today's date and current time such as 2023-05-26 12:34:56
DATE_ADD(date, number_of_days) Returns the date that's the specified number of days after the specified date. DATE_ADD('2017-07-30', 1) returns 2017-07-31
DATE_FORMAT(expr, format[, locale])

Formats an expr of Date, based on the format and optional locale provided. Default locale is en-US. Supported language tags.

Supported date format patterns:

  • yy: two digit year
  • yyyy: four digit year
  • M: Numeric month, such as 1 for January
  • MM: Numeric month, such as 01 for January
  • MMM: Abbreviated month, such as Jan
  • MMMM: Full month, such as January
  • d: Numeric day of the month, such as 1 for June 1
  • dd: Numeric day of the month, such as 01 for June 1
  • DDD: Numeric day of the year from 001 to 366, such as 002 for January 2
  • F: Numeric day of the week in a month, such as 3 for 3rd Monday in June.
  • EEE or E: Abbreviated named day of the week, such as Sun for Sunday
  • EEEE: Named day of the week, such as Sunday
  • HH: 24 hour format from 00 to 23
  • H: 24 hour format from 0 to 23
  • hh: 12 hour format from 01 to 12
  • h: 12 hour format from 1 to 12
  • mm: minutes from 00 to 59
  • ss: seconds from 00 to 59
  • SSS: milliseconds from 000 to 999
  • a: AM or PM
  • z: time zone such as PDT

DATE_FORMAT(Date '2020-10-11', 'yyyy-MM-dd') returns '2020-10-11'. The first argument is a Date object representing October 11th, 2020.

DATE_FORMAT(Date '2018-junio-17', 'yyyy/MMMM/dd', 'es-ES') returns '2018/junio/17'

DAYOFMONTH(date) Returns the date's day in the month. DAYOFMONTH('2020-12-25') returns 25
DAYOFWEEK(date) Returns the date's day in the week. DAYOFWEEK('2020-12-25') returns 6 for Friday. In the United States, Sunday is considered to be 1, Monday is 2, and so on.
DAYOFYEAR(date) Returns the date's day in the year. DAYOFYEAR('2020-12-25') returns 360
WEEKOFYEAR(date) Returns the date's week in the year.

WEEKOFYEAR('2022-07-28') returns 30

WEEKOFYEAR('2022-07-28 13:24:30') returns 30

HOUR(datetime) Returns the datetime's hour value. HOUR('2020-12-25 15:10:30') returns 15
LAST_DAY(date) Returns the date's last day of the month. LAST_DAY('2020-12-25') returns 31
MINUTE(datetime) Returns the datetime's minute value. HOUR('2020-12-25 15:10:30') returns 10
MONTH(date) Returns the date's month value. MONTH('2020-06-25') returns 6
QUARTER(date) Returns the quarter of year the date is in. QUARTER('2020-12-25') returns 4
SECOND(datetime) Returns the datetime's second value. SECOND('2020-12-25 15:10:30') returns 30
TO_DATE(string, format_string[, localeStr]) Parses the string expression with the format_string expression to a date. Locale is optional. Default is en-US. Supported language tags.

In pipeline expressions, the format_string must use the strftime format codes. Otherwise, the supported case-sensitive format strings are:

  • yy: two digit year
  • yyyy: four digit year
  • M: Numeric month, such as 1 for January
  • MM: Numeric month, such as 01 for January
  • MMM: Abbreviated month, such as Jan
  • MMMM: Full month, such as January
  • d: Numeric day of the month, such as 1 for June 1
  • dd: Numeric day of the month, such as 01 for June 1
  • DDD: Numeric day of the year from 001 to 366, such as 002 for January 2
  • F: Numeric day of the week in a month, such as 3 for 3rd Monday in June.
  • EEE or E: Abbreviated named day of the week, such as Sun for Sunday
  • EEEE: Named day of the week, such as Sunday
  • HH: 24 hour format from 00 to 23
  • H: 24 hour format from 0 to 23
  • hh: 12 hour format from 01 to 12
  • h: 12 hour format from 1 to 12
  • mm: minutes from 00 to 59
  • ss: seconds from 00 to 59
  • SSS: milliseconds from 000 to 999
  • a: AM or PM
  • z: time zone such as PDT

TO_DATE('31 December 2016', 'dd MMMM yyyy') returns a Date value of 2016-12-31

TO_DATE('2018/junio/17', 'yyyy/MMMM/dd', 'es-ES') returns a Date value of 2018-06-17

TO_TIMESTAMP(expr, format_string[, localeStr]) Converts an expr of VARCHAR to a value of TIMESTAMP, based on the format_string and the optional localeStr provided.

In pipeline expressions, the format_string must use the strftime format codes. Otherwise, the supported format patterns are:

  • yy: two digit year
  • yyyy: four digit year
  • M: Numeric month, such as 1 for January
  • MM: Numeric month, such as 01 for January
  • MMM: Abbreviated month, such as Jan
  • MMMM: Full month, such as January
  • d: Numeric day of the month, such as 1 for June 1
  • dd: Numeric day of the month, such as 01 for June 1
  • DDD: Numeric day of the year from 001 to 366, such as 002 for January 2
  • F: Numeric day of the week in a month, such as 3 for 3rd Monday in June.
  • EEE or E: Abbreviated named day of the week, such as Sun for Sunday
  • EEEE: Named day of the week, such as Sunday
  • HH: 24 hour format from 00 to 23
  • H: 24 hour format from 0 to 23
  • hh: 12 hour format from 01 to 12
  • h: 12 hour format from 1 to 12
  • mm: minutes from 00 to 59
  • ss: seconds from 00 to 59
  • SSS: milliseconds from 000 to 999
  • a: AM or PM
  • z: time zone such as PDT
TO_TIMESTAMP('2020-10-11 11:10:10', 'yyyy-MM-dd HH:mm:ss') returns a TIMESTAMP object representing 11am 10:10 Oct 11th, 2020
WEEK(date)

Returns the date's week value.

WEEK('2020-06-25') returns 4
YEAR(date) Returns the date's year value. YEAR('2020-06-25') returns 2020
ADD_MONTHS(date_expr, number_months) Returns the date after adding the specified number of months to the specified date, timestamp or string with a format such as yyyy-MM-dd or yyyy-MM-dd HH:mm:ss.SSS.

ADD_MONTHS('2017-07-30', 1) returns 2017-08-30

ADD_MONTHS('2017-07-30 09:07:21', 1) returns 2017-08-30

MONTHS_BETWEEN(start_date_expr, end_date_expr)

Returns the number of months between start_date_expr and end_date_expr. start_date_expr and end_date_expr can be a date, timestamp or string with a format such as yyyy-MM-dd or yyyy-MM-dd HH:mm:ss.SSS

A whole number is returned if both dates are the same day of the month, or both are the last day in their respective months. Otherwise, the difference is calculated based on 31 days per month.

MONTHS_BETWEEN('2022-01-01', '2022-01-31') returns 1

MONTHS_BETWEEN('2022-07-28', '2020-07-25') returns 24

MONTHS_BETWEEN('2022-07-28 13:24:30', '2020-07-25 13:24:30') returns 24

FROM_UTC_TIMESTAMP(time_stamp, time_zone)

Interprets a date, timestamp or string as a UTC time and converts that time to a timestamp in the specified time zone.

For string, use a format such as: yyyy-MM-dd or yyyy-MM-dd HH:mm:ss.SSS

Time zone format is either a region-based zone ID (for example, 'area/city' such as 'Asia/Seoul', or a time zone offset (for example, UTC+02).

FROM_UTC_TIMESTAMP('2017-07-14 02:40:00.0', 'GMT+1') returns 2017-07-14 03:40:00.0
TO_UTC_TIMESTAMP(time_stamp, time_zone)

Converts a date, timestamp or string in the specified time zone to a UTC timestamp.

For string, use a format such as: yyyy-MM-dd or yyyy-MM-dd HH:mm:ss.SSS

Time zone format is either a region-based zone ID (for example, 'area/city' such as 'Asia/Seoul'), or a time zone offset (for example, UTC+02).

TO_UTC_TIMESTAMP('2017-07-14 02:40:00.0', 'GMT+1') returns 2017-07-14 01:40:00.0
FROM_UNIXTIME(unix_time[, fmt])

Converts the specified Unix time or epoch to a string that represents the timestamp of that moment in the current system time zone and in the specified format.

Note: Unix time is the number of seconds that have elapsed since January 1, 1970 00:00:00 UTC.

If fmt is omitted, the default format is yyyy-MM-dd HH:mm:ss

FROM_UNIXTIME(1255033470) returns '2009-10-08 13:24:30'

FROM_UNIXTIME(1637258854) returns '2021-11-18 10:07:34'

Default time zone is PST in the examples

UNIX_TIMESTAMP([time_expr[, fmt]])

Converts the current or specified time to a Unix timestamp in seconds.

time_expr is a date, timestamp or string with a format such as yyyy-MM-dd or yyyy-MM-dd HH:mm:ss.SSS

If time_expr is not provided, the current time is converted.

If time_expr is a string and fmt is omitted, the default is yyyy-MM-dd HH:mm:ss

UNIX_TIMESTAMP('1970-01-01 00:00:00', 'yyyy-MM-dd HH:mm:ss') returns 28800

Default time zone is PST in this example

INTERVAL 'year' YEAR[(year_precision)]

Returns a period of time in years.

year_precision is the number of digits in the year field; it ranges from 0 to 9. If year_precision is omitted, the default is 2 (must be less than 100 years.)

INTERVAL '1' YEAR returns an interval of 1 year

INTERVAL '200' YEAR(3) returns an interval of 200 years

INTERVAL 'year month' YEAR[(year_precision)] TO MONTH

Returns a period of time in years and months. Use to store a period of time using year and month fields.

year_precision is the number of digits in the year field; it ranges from 0 to 9. If year_precision is omitted, the default is 2 (must be less than 100 years.)

INTERVAL '100-5' YEAR(3) TO MONTH returns an interval of 100 years, 5 months. Must must specify the leading year precision of 3.
INTERVAL 'month' MONTH[(month_precision)]

Returns a period of time in months.

month_precision is the number of digits in the month field; it ranges from 0 to 9. If month_precision is omitted, the default is 2 (must be less than 100 years.)

INTERVAL '200' MONTH(3) returns an interval of 200 months. Must specify the month precision of 3.
INTERVAL 'day time' DAY[(day_precision)] TO SECOND[(fractional_seconds_precision)]

Returns a period of time in terms of days, hours, minutes, and seconds.

day_precision is the number of digits in the day field; it ranges from 0 to 9. The default is 2.

fractional_seconds_precision is the number of digits in the fractional part of the second value in the time field; it ranges from 0 to 9.

INTERVAL '11 10:09:08.555' DAY TO SECOND(3) returns an interval of 11 days, 10 hours, 09 minutes, 08 seconds, and 555 thousandths of a second
INTERVAL 'day time' DAY[(day_precision)] TO MINUTE[(minute_precision)]

Returns a period of time in terms of days, hours, and minutes.

day_precision is the number of digits in the day field; it ranges from 0 to 9. The default is 2.

minute_precision is the number of digits in the minute field; it ranges from 0 to 2. The default is 2.

INTERVAL '11 10:09' DAY TO MINUTE returns an interval of 11 days, 10 hours, and 09 minutes
INTERVAL 'day time' DAY[(day_precision)] TO HOUR[(hour_precision)]

Returns a period of time in terms of days and hours.

day_precision is the number of digits in the day field; it ranges from 0 to 9. The default is 2.

hour_precision is the number of digits in the hour field; it ranges from 0 to 2. The default is 2.

INTERVAL '100 10' DAY(3) TO HOUR returns an interval of 100 days 10 hours
INTERVAL 'day' DAY[(day_precision)]

Returns a period of time in terms of days.

day_precision is the number of digits in the day field; it ranges from 0 to 9. The default is 2.

INTERVAL '999' DAY(3) returns an interval of 999 days
INTERVAL 'time' HOUR[(hour_precision)] TO SECOND[(fractional_seconds_precision)]

Returns a period of time in terms of hours, minutes, and seconds.

hour_precision is the number of digits in the hour field; it ranges from 0 to 2. The default is 2.

fractional_seconds_precision is the number of digits in the fractional part of the second value in the time field; it ranges from 0 to 9.

INTERVAL '09:08:07.6666666' HOUR TO SECOND(7) returns an interval of 9 hours, 08 minutes, and 7.6666666 seconds
INTERVAL 'time' HOUR[(hour_precision)] TO MINUTE[(minute_precision)]

Returns a period of time in terms of hours and minutes.

hour_precision is the number of digits in the hour field; it ranges from 0 to 2. The default is 2.

minute_precision is the number of digits in the minute field; it ranges from 0 to 2. The default is 2.

INTERVAL '09:30' HOUR TO MINUTE returns an interval of 9 hours and 30 minutes
INTERVAL 'hour' HOUR[(hour_precision)]

Returns a period of time in terms of hours.

hour_precision is the number of digits in the hour field; it ranges from 0 to 2. The default is 2.

INTERVAL '40' HOUR returns an interval of 40 hours
INTERVAL 'minute' MINUTE[(minute_precision)]

Returns a period of time in terms of minutes.

minute_precision is the number of digits in the minute field; it ranges from 0 to 2. The default is 2.

INTERVAL '15' MINUTE returns an interval of 15 minutes
INTERVAL 'time' MINUTE[(minute_precision)] TO SECOND[(fractional_seconds_precision)]

Returns a period of time in terms of minutes and seconds.

minute_precision is the number of digits in the minute field; it ranges from 0 to 2. The default is 2.

fractional_seconds_precision is the number of digits in the fractional part of the second value in the time field; it ranges from 0 to 9.

INTERVAL '15:30' MINUTE TO SECOND returns an interval of 15 minutes and 30 seconds
INTERVAL 'second' SECOND[(fractional_seconds_precision)]

Returns a period of time in terms of seconds.

fractional_seconds_precision is the number of digits in the fractional part of the second field; it ranges from 0 to 9. The default is 3.

INTERVAL '15.678' SECOND returns an interval of 15.678 seconds
Analytic Functions
Function Description Example
FIRST_VALUE(value) OVER ([ partition_clause ] order_by_clause [ windowFrame_clause ] ) Returns the value evaluated at the row that's the first row of the window frame. FIRST_VALUE(BANK_ID) OVER (PARTITION BY BANK_ID ORDER BY BANK_NAME ROWS BETWEEN CURRENT ROW AND 1 FOLLOWING) returns the first value of BANK_ID in a window over which the rows are computed as the current row and 1 row after that row, partitioned by BANK_ID and in ascending order of BANK_NAME.
LAG(value[, offset[, default]]) OVER ([ partition_clause ] order_by_clause) Returns the value evaluated at the row at a given offset before the current row within the partition. If there is no such row, the default value is returned. Both offset and default are evaluated with respect to the current row. If omitted, offset is defaulted to 1 and default to NULL. LAG(BANK_ID, 2, 'hello') OVER (PARTITION BY BANK_ID ORDER BY BANK_NAME DESC) returns the value of BANK_ID from the second row before the current row, partitioned by BANK_ID and in descending order of BANK_NAME. If there is no such value, hello is returned.
LAST_VALUE(value) OVER ([ partition_clause ] order_by_clause [ windowFrame_clause ]) Returns the value evaluated at the row that is the last row of the window frame. LAST_VALUE(BANK_ID) OVER (PARTITION BY BANK_ID ORDER BY BANK_NAME ROWS BETWEEN CURRENT ROW AND 1 FOLLOWING) returns the last value of BANK_ID in a window over which the rows are computed as the current row and 1 row after that row, partitioned by BANK_ID and in ascending order of BANK_NAME.
LEAD(value[, offset[, default]]) OVER ([ partition_clause ] order_by_clause) Returns the value evaluated at the row at a given offset after the current row within the partition. If there is no such row, the default value is returned. Both offset and default are evaluated with respect to the current row. If omitted, offset is defaulted to 1 and default to NULL. LEAD(BANK_ID, 2, 'hello') OVER (PARTITION BY BANK_ID ORDER BY BANK_NAME) returns the value of BANK_ID from the second row after the current row, partitioned by BANK_ID and in ascending order of BANK_NAME. If there is no such value, hello is returned.
RANK() OVER([ partition_clause ] order_by_clause) Returns the rank of the current row with gaps, counting from 1. RANK() OVER (PARTITION BY BANK_ID ORDER BY BANK_NAME) returns the rank of each row within the partition group of BANK_ID, in ascending order of BANK_NAME.
ROW_NUMBER() OVER([ partition_clause ] order_by_clause) Returns the unique number of the current row within its partition, counting from 1. ROW_NUMBER() OVER (PARTITION BY BANK_ID ORDER BY BANK_NAME) returns the unique row number of each row within the partition group of BANK_ID, in ascending order of BANK_NAME.
String Functions
FunctionsDescriptionExample
CAST(value AS type)Returns the specified value in the specified type.CAST("10" AS INT) returns 10
CONCAT(string, string)Returns the combined values of strings or columns.CONCAT('Oracle','SQL') returns OracleSQL
CONCAT_WS(separator, expression1, expression2, expression3,...) Returns the combined values of strings or columns using the specified separator in between the strings or columns.

A separator is required and it must be a string.

At least one expression must be provided after the separator. For example: CONCAT_WS(',' col1)

CONCAT_WS('-', 'Hello', 'Oracle') returns Hello-Oracle

CONCAT_WS(' ', address, city, postal_code) returns 123 MyCity 987654

If a child of the function is an array, then the array is flattened:

CONCAT_WS(',', 1,2,3, to_array(4,5,6), to_array(7,8), 9) returns 1,2,3,4,5,6,7,8,9

INITCAP(string)Returns the string with the first letter in each word capitalized, while all other letters are in lowercase, and each word is delimited by a white space.INITCAP('oRACLE sql') returns Oracle Sql
INSTR(string, substring[start_position])Returns the (1-based) index of the first occurrence of substring in string.INSTR('OracleSQL', 'SQL') returns 7
LOWER(string)Returns the string with all letters changed to lowercase.LOWER('ORACLE') returns oracle
LENGTH(string)Returns the character length of string or the number of bytes of binary data. The length of the string includes trailing spaces.LENGTH('Oracle') returns 6
LTRIM(string)Returns the string with leading spaces removed from the left.LTRIM(' Oracle')
NVL(expr1, epxr2)Returns the argument that is not null.NVL(EXPRESSION_3.CUSTOMERS_JSON.CONTINENT_ID, ROWID())
REGEXP_SUBSTR(string, regexp[, RegexGroupIdx])Searches and extracts the string that matches a regular expression pattern from the input string. If the optional capturing group index is provided, the function extracts the specific group.

REGEXP_SUBSTR('https://www.oracle.com/products', 'https://([[:alnum:]]+\.?){3,4}/?') returns https://www.oracle.com

REGEXP_SUBSTR('22;33;44', '([0-9.]*);([0-9.]*);([0-9.]*)', 1) returns 22
REPLACE(string, search, replacement)Replaces all occurrences of search with replacement.

If search is not found in string, then string is returned unchanged.

If replacement is not specified or is an empty string, nothing replaces the search that is removed from string.

REPLACE('ABCabc', 'abc', 'DEF') returns ABCDEF
RTRIM(string)Returns the string with leading spaces removed from the right.RTRIM('Oracle ')
SUBSTRING(string, position[, substring_length])Returns the substring starting at position. SUBSTRING('Oracle SQL' FROM 2 FOR 3) returns rac
For numbers, TO_CHAR(expr) and for dates TO_CHAR(expr, format[, locale])Converts numbers and dates into strings. For numbers, no format is required. For dates, use the same format as DATE_FORMAT described in Date and Time Functions. Default locale is en-US. See supported language tags.

In pipeline expressions, the format_string must use the strftime format codes. Otherwise, the supported date format patterns are:

  • yy: two digit year
  • yyyy: four digit year
  • M: Numeric month, such as 1 for January
  • MM: Numeric month, such as 01 for January
  • MMM: Abbreviated month, such as Jan
  • MMMM: Full month, such as January
  • d: Numeric day of the month, such as 1 for June 1
  • dd: Numeric day of the month, such as 01 for June 1
  • DDD: Numeric day of the year from 001 to 366, such as 002 for January 2
  • F: Numeric day of the week in a month, such as 3 for 3rd Monday in June.
  • EEE or E: Abbreviated named day of the week, such as Sun for Sunday
  • EEEE: Named day of the week, such as Sunday
  • HH: 24 hour format from 00 to 23
  • H: 24 hour format from 0 to 23
  • hh: 12 hour format from 01 to 12
  • h: 12 hour format from 1 to 12
  • mm: minutes from 00 to 59
  • ss: seconds from 00 to 59
  • SSS: milliseconds from 000 to 999
  • a: AM or PM
  • z: time zone such as PDT

Number example: TO_CHAR(123) returns 123

Date example: TO_CHAR(Date '2020-10-30', 'yyyy.MM.dd', 'en-US') returns the string 2020.10.30. The first argument is a Date object representing Oct 30th, 2020.

UPPER(string)Returns a string with all letters changed to uppercase.UPPER('oracle') returns ORACLE
LPAD(str, len[, pad])Returns a string that is left-padded with specified characters to a certain length. If the pad character is omitted, the default is a space.LPAD('ABC', 5, '*') returns '**ABC'
RPAD(str, len[, pad])Returns a string that is right-padded with specified characters to a certain length. If the pad character is omitted, the default is a space.RPAD('XYZ', 6, '+' ) returns 'XYZ+++'
Operator (Comparison) Functions
FunctionDescriptionExample
CASE WHEN condition1 THEN result1 ELSE result2 ENDReturns the value for which a condition is met.CASE WHEN 1 > 0 THEN 'ABC' ELSE 'XYZ' END returns ABC if 1> 0, otherwise returns XYZ
ANDThe logical AND operator. Returns true if both operands are true, otherwise returns false.(x = 10 AND y = 20) returns "true" if x is equal to 10 and y is equal to 20. If either one is not true, then it returns "false"
ORThe logical OR operator. Returns true if either operand is true or both are true, otherwise returns false.(x = 10 OR y = 20) returns "false" if x is not equal to 10 and also y is not equal to 20. If either one is true, then it returns "true"
NOTThe logical NOT operator.
LIKEPerforms string pattern matching, whether string1 matches the pattern in string2.
=Tests for equality. Returns true if expr1 equals expr2, otherwise returns false.x = 10 returns "true" when value of x is 10, else it returns "false"
!=Tests for inequality. Returns true if expr1 does not equal to expr2, otherwise returns false.x != 10 returns "false" if value of x is 10, else it returns "true"
>Tests for an expression greater than. Returns true if expr1 is greater than expr2.x > 10 returns "true" if value of x is greater than 10, else it returns "false"
>=Tests for an expression greater than or equal to. Returns true if expr1 is greater than or equal to expr2.x > =10 returns "true" if value of x is greater than or equal to 10, else it returns "false"
<Tests for an expression less than. Returns true if expr1 is less than expr2.x < 10 returns "true" if value of x is less than 10, else it returns "false"
<=Tests for an expression less than or equal to. Returns true if expr1 is less than or equal to expr2.x <= 10 returns "true" if value of x is less than 10, else it returns "false"
||Concatenates two strings.'XYZ' || 'hello' returns 'XYZhello'
BETWEENEvaluates a range.FILTER_1.BANK.BANK_ID BETWEEN 1003 AND 1007
INTests whether an expression matches a list of values.FILTER_2.ORDERS.ORDER_ID IN (1003, 1007)
Unique ID Functions
FunctionDescriptionExample
NUMERIC_ID()Generates a universally unique identifier that is a 64-bit number for each row.NUMERIC_ID() returns for example, 3458761969522180096 and 3458762008176885761
ROWID()Generates monotonically increasing 64-bit numbers.ROWID() returns for example, 0, 1, 2, and so on
UUID()Generates a universally unique identifier that is a 128-bit String for each row.UUID() returns for example, 20d45c2f-0d56-4356-8910-162f4f40fb6d
MONOTONICALLY_INCREASING_ID() Generates unique, monotonically increasing 64-bit integers that are not consecutive numbers. MONOTONICALLY_INCREASING_ID() returns for example, 8589934592 and 25769803776
Conditional Functions
FunctionDescriptionExample
COALESCE(value, value [, value]*)Returns the first non-null argument, if it exists, otherwise returns null.COALESCE(NULL, 1, NULL) returns 1
NULLIF(value, value)Returns null if the two values equal each other, otherwise returns the first value.NULLIF('ABC','XYZ') returns ABC
Hierarchical Functions
FunctionDescriptionExample
SCHEMA_OF_JSON(string)Parses a JSON string and infers the schema in DDL format.

SCHEMA_OF_JSON('[{\"Zipcode\":704,\"ZipCodeType\":\"STANDARD\",\"City\":\"ORACLECITY\",\"State\":\"OC\"}]') returns 'ARRAY<STRUCT<City:string,State:string,ZipCodeType:string,Zipcode:bigint>>'

SCHEMA_OF_JSON('[{\"col\":0}]') returns 'ARRAY<STRUCT<col: BIGINT>>'

FROM_JSON(column, string)

Parses a column containing a JSON string into one of the following types, with the specified schema.

  • Map, with String as the key type
  • Struct
  • Array

FROM_JSON('{\"Zipcode\":704,\"City\":\"ORACLE CITY\"}', 'STRUCT<Zipcode: BIGINT, City: STRING>') returns a Struct type column with the specified schema: {704, ORACLE CITY}

FROM_JSON('{\"a\":1, \"b\":0.8}', 'STRUCT<a: BIGINT, b: DOUBLE>') returns a Struct type column with the specified schema: {1, 0.8}

TO_JSON(column)Converts a column containing a type of Struct or Array of Structs, or a Map or Array of Map, into a JSON string.TO_JSON(TO_STRUCT('s1', TO_ARRAY(1,2,3), 's2', TO_MAP('key', 'value'))) returns a JSON string {"s1":[1,2,3],"s2":{"key":"value"}}
TO_MAP(string,column[,string,column]*)Creates a new column of Map type. The input columns must be grouped as key-value pairs. The input key columns cannot be null, and must all have the same data type. The input value columns must all have the same data type.

TO_MAP('Ename',Expression_1.attribute1) returns a Map type column: {"ENAME" -> 100}

TO_MAP('block', EXPRESSION_1.MYSOURCE.address.block, 'unit', EXPRESSION_1.MYSOURCE.address.unit) returns a Map type column: {"block" -> 1,"unit" -> 1}

TO_STRUCT(string,column[,string,column]*)Creates a new column of Struct type. The input columns must be grouped as key-value pairs.

TO_STRUCT('Ename',Expression_1.attribute1) returns {100}

TO_STRUCT('Id',Expression_1.attribute1, 'Name', Expression_1.attribute2) returns {100, "John"}

TO_ARRAY(column[,column]*)Creates a new column as Array type. The input columns must all have the same data type.

TO_Array(Expression_1.attribute1) returns [100]

TO_ARRAY(EXPRESSION_1.attribute2,EXPRESSION_1.attribute3) returns ["John","Friend"]

Higher-Order Functions

Data flow operators that support creating expressions and hierarchical data types can use higher-order functions.

The supported operators are:

  • Aggregate

  • Expression

  • Filter

  • Join

  • Lookup

  • Split

  • Pivot

FunctionDescriptionExample
TRANSFORM(column, lambda_function)Takes an array and an anonymous function, and sets up a new array by applying the function to each element, and assigning the result to the output array.For an input array of integers [1, 2, 3], TRANSFORM(array, x -> x + 1) returns a new array of [2, 3, 4].
TRANSFORM_KEYS(column, lambda_function)Takes a map and a function with 2 arguments (key and value), and returns a map in which the keys have the type of the result of the lambda function, and the values have the type of the column map values.For an input map with integer keys and string values of {1 -> 'value1', 2 -> 'value2', 3 -> 'value3'}, TRANSFORM_KEYS(map, (k, v) -> k * 2 + 1) returns a new map of {3 -> 'value1', 5 -> 'value2', 7 -> 'value3'}.
TRANSFORM_VALUES(column, lambda_function)Takes a map and a function with 2 arguments (key and value), and returns a map in which the values have the type of the result of the lambda functions, and the keys have the type of the column map keys. For an input map with string keys and string values of {'a' -> 'value1', 'b' -> 'value2', 'c' -> 'value3'}, TRANSFORM_VALUES(map, (k, v) -> k || __ '' || v) returns a new map of {'a' -> 'a_value1', 'b' -> 'b_value2', 'c' -> 'c_value3'}.
ARRAY_SORT(array(...), lambda_function)

Only the Expression operator supports ARRAY_SORT.

Takes an array and sorts according to the given function that takes 2 arguments.

The function must return -1, 0, or 1 depending on whether the first element is less than, equal to, or greater than the second element.

If the function is omitted, the array is sorted in ascending order.

array_sort(to_array(5, 6, 1),
                   (left, right) -> CASE WHEN left < right THEN -1
                                         WHEN left > right THEN 1 ELSE 0 END)

The returned array is:

[1,5,6]

Join Operator

Use the join operator to link data from multiple inbound sources.

Adding and configuring a join operator
  1. From the Operators panel, drag a Join operator onto the canvas.
  2. With the join operator in focus, on the Details tab of the Properties panel, enter a name in the Identifier field, or leave the name as-is.
  3. Select the Join type. The types of join supported are:
    Join Types
    Icon Join Type Description
    inner join venn diagram Inner Selects the data from the two inbound sources where the join condition is met.
    left outer join venn diagram Left outer Selects all the data from the inbound source 1 and the data from inbound source 2 where the join condition is met.
    right outer join venn diagram Right outer Selects the data from inbound source 1 where the join condition is met and all the data from inbound source 2.
    full outer join venn diagram Full outer Selects all data from the two inbound sources, matching rows where matches can be made and inserting nulls where no matches are found.
  4. Next to Join condition, click Create.
  5. In the Create Join Condition panel, Condition Builder section, you can double-click or drag attributes, parameters, and functions to add to the editor to build a condition. You can also enter a condition expression manually into the editor, and validate the expression.
    Note

    In the editor, an added element such as a function might have placeholders. To replace a placeholder with another element, highlight the placeholder, and then double-click another element from the list.
  6. Click Create.
  7. (Optional) Click Assign parameter to use parameters so that the join condition is not bound to the compiled code when you publish the data flow. See Assigning a Parameter.
  8. Under the Attributes tab, select to view incoming attributes or output attributes. Incoming attributes link to the operator on the left side. Output attributes link to the operator on the right side and go into the next operator.

    You can select individual attributes or use the filter icons to filter the list of attributes. Then apply bulk exclude rules to the selected or filtered attributes.

    • Click the filter icon in the Name column to filter attributes by a name pattern. In the filter field, enter a simple regular expression. You can use wildcards ? and * in the regex pattern.
    • Click the filter icon in the Type column to filter attributes by a data type. Use the menu to select the type to use as a filter.
      Note

      You can apply only one name pattern filter, but multiple type filters at a time. For example, to filter by the name pattern *_CODE and by the type numeric or varchar, you apply one name pattern filter (*_CODE) and two type filters (numeric, varchar).
    • Use the Actions menu to apply bulk exclude rules to the selected attributes or filtered attributes. Select Exclude by selection or Exclude by applied filter.
    • Click View rules to open the Rules panel. You can view and manage the rules that have been applied to the data entity. By default, the first rule in the Rules panel includes everything.

    A complex type is shown as ARRAY (data type), COMPOSITE, or MAP (key type, value type). See Hierarchical Data Types to understand what is supported for complex types.

  9. In the Data tab, view a sampling of the data, based on the configuration of the operator and any rules you have applied in the Attributes tab.

    The Data Profile does not appear for complex data attributes. For complex data attributes, to view the hierarchy of the data structure in a simplified structure, click the complex data shown. For example, [...] or {…}.

    You can apply transformations to individual attributes or do a bulk transformation on a group of attributes.

    For entities with hierarchical data types, see Hierarchical Data Types to understand what is supported.

  10. In the Validation tab, check for warnings or errors that might cause the data flow to fail.
Creating a join condition

Use the Condition Builder to visually select elements to create a join condition. You can also enter a join condition manually in the editor.

Creating a join condition lets you select data from two inbound sources based on the condition.

The elements that you can use in a join condition include incoming attributes, parameters, and functions. You can double-click or drag an element from the list to add to the editor to build a condition. You can validate the condition before creating it.

Incoming displays the attributes from the upstream ports connected to the join operator as two separate JOIN folders. View the attributes from each port by expanding or collapsing the appropriate JOIN folder. For example, JOIN_1_1 and JOIN_1_2.

JOIN_1_1.BANK_CUSTOMER.ADDRESS_ID = JOIN_1_2.BANK_ADDRESS.ADDRESS_ID

Parameters are the expression parameters that have been added to the data flow using the Condition Builder (filter, join, lookup, and split operators) or Expression Builder (expression and aggregate operators). An expression parameter has a name, type, and default value. See Adding an Expression Parameter.

Suppose you want to join two sources and retain only the rows where BANK_NAME='ABC Bank'. You can create a VARCHAR parameter with the name P_VARCHAR, and set the default value to ABC BANK. Then you can create the join expression as:

JOIN_1_1.ADDRESSES.BANK_ID = JOIN_1_2.BANK.BANK_ID AND JOIN_1_2.BANK.BANK_NAME = $P_VARCHAR

Functions are the functions available in Data Integration that you can use in a condition. Functions are operations performed on arguments passed to the function. Functions calculate, manipulate, or extract data values from arguments.

Here's a list of functions that are available for you to add when you construct conditions:

Hash Functions
FunctionDescriptionExample
MD5(all data types)Calculates an MD5 checksum of the data type, and returns a string value. MD5(column_name)
SHA1(all data types)Calculates a SHA-1 hash value of the data type, and returns a string value. SHA1(column_name)
SHA2(all data types, bitLength)Calculates a SHA-2 hash value of the data type, and returns a string value. bitLength is an integer. SHA2 (column_name, bitLength can be set to 0 (equivalent to 256), 256, 384, or 512).
ORA_HASH(expr, [max_bucket], [seed_value])

Computes a hash value for expr, and returns a NUMBER value.

expr can be an expression, a column, a literal.

max_bucket is the maximum bucket value returned, between 0 and 4294967295 (default).

seed_value is a value between 0 (default) and 4294967295.

Oracle applies the hash function to the combination of expr and seed_value to produce many different results for the same set of data.

ORA_HASH('1')

ORA_HASH('b', 2)

ORA_HASH(100, 10, 10)

ORA_HASH(EXPRESSION_1.CUSTOMERS.SSN, 2)

Arithmetic Functions
FunctionDescriptionExample
ABS(numeric)Returns the absolute power of the numeric value.ABS(-1)
CEIL(numeric)Returns the smallest integer not greater than the numeric valueCEIL(-1,2)
FLOOR(numeric)Returns the largest integer not greater than the numeric value.FLOOR(-1,2)
MOD(numeric1, numeric2)Returns the remainder after numeric1 is divided by numeric2.MOD(8,2)
POWER(numeric1, numeric2)Raises numeric1 to the power of numeric2.POWER(2,3)
ROUND(numeric1, numeric2)Returns numeric1 rounded to numeric2 decimal places.ROUND(2.5,0)
TRUNC(numeric1, numeric2)Returns numeric1 truncated to numeric2 decimal places.TRUNC(2.5,0)
TO_NUMBER(expr[, format, locale])Converts an expr to a number, based on the format and optional locale provided. Default locale is en-US. Supported language tags.

Supported format patterns:

  • 0: A digit
  • #: A digit, zero shows as absent
  • .: Placeholder for decimal separator
  • ,: Placeholder for grouping separator
  • E: Separates mantissa and exponent for exponential formats
  • -: Default negative prefix
  • ¤: Currency sign; replaced by currency symbol; if doubled, replaced by international currency symbol; if present in a pattern, the monetary decimal separator is used instead of the decimal separator

TO_NUMBER('5467.12') returns returns 5467.12

TO_NUMBER('-USD45,677.7', '¤¤##,###.#', 'en-US') returns -45677.7

Date and Time Functions
Function Description Example
CURRENT_DATE Returns the current date. CURRENT_DATEreturns today's date such as 2023-05-26
CURRENT_TIMESTAMP Returns the current date and time for the session time zone. CURRENT_TIMESTAMP returns today's date and current time such as 2023-05-26 12:34:56
DATE_ADD(date, number_of_days) Returns the date that's the specified number of days after the specified date. DATE_ADD('2017-07-30', 1) returns 2017-07-31
DATE_FORMAT(expr, format[, locale])

Formats an expr of Date, based on the format and optional locale provided. Default locale is en-US. Supported language tags.

Supported date format patterns:

  • yy: two digit year
  • yyyy: four digit year
  • M: Numeric month, such as 1 for January
  • MM: Numeric month, such as 01 for January
  • MMM: Abbreviated month, such as Jan
  • MMMM: Full month, such as January
  • d: Numeric day of the month, such as 1 for June 1
  • dd: Numeric day of the month, such as 01 for June 1
  • DDD: Numeric day of the year from 001 to 366, such as 002 for January 2
  • F: Numeric day of the week in a month, such as 3 for 3rd Monday in June.
  • EEE or E: Abbreviated named day of the week, such as Sun for Sunday
  • EEEE: Named day of the week, such as Sunday
  • HH: 24 hour format from 00 to 23
  • H: 24 hour format from 0 to 23
  • hh: 12 hour format from 01 to 12
  • h: 12 hour format from 1 to 12
  • mm: minutes from 00 to 59
  • ss: seconds from 00 to 59
  • SSS: milliseconds from 000 to 999
  • a: AM or PM
  • z: time zone such as PDT

DATE_FORMAT(Date '2020-10-11', 'yyyy-MM-dd') returns '2020-10-11'. The first argument is a Date object representing October 11th, 2020.

DATE_FORMAT(Date '2018-junio-17', 'yyyy/MMMM/dd', 'es-ES') returns '2018/junio/17'

DAYOFMONTH(date) Returns the date's day in the month. DAYOFMONTH('2020-12-25') returns 25
DAYOFWEEK(date) Returns the date's day in the week. DAYOFWEEK('2020-12-25') returns 6 for Friday. In the United States, Sunday is considered to be 1, Monday is 2, and so on.
DAYOFYEAR(date) Returns the date's day in the year. DAYOFYEAR('2020-12-25') returns 360
WEEKOFYEAR(date) Returns the date's week in the year.

WEEKOFYEAR('2022-07-28') returns 30

WEEKOFYEAR('2022-07-28 13:24:30') returns 30

HOUR(datetime) Returns the datetime's hour value. HOUR('2020-12-25 15:10:30') returns 15
LAST_DAY(date) Returns the date's last day of the month. LAST_DAY('2020-12-25') returns 31
MINUTE(datetime) Returns the datetime's minute value. HOUR('2020-12-25 15:10:30') returns 10
MONTH(date) Returns the date's month value. MONTH('2020-06-25') returns 6
QUARTER(date) Returns the quarter of year the date is in. QUARTER('2020-12-25') returns 4
SECOND(datetime) Returns the datetime's second value. SECOND('2020-12-25 15:10:30') returns 30
TO_DATE(string, format_string[, localeStr]) Parses the string expression with the format_string expression to a date. Locale is optional. Default is en-US. Supported language tags.

In pipeline expressions, the format_string must use the strftime format codes. Otherwise, the supported case-sensitive format strings are:

  • yy: two digit year
  • yyyy: four digit year
  • M: Numeric month, such as 1 for January
  • MM: Numeric month, such as 01 for January
  • MMM: Abbreviated month, such as Jan
  • MMMM: Full month, such as January
  • d: Numeric day of the month, such as 1 for June 1
  • dd: Numeric day of the month, such as 01 for June 1
  • DDD: Numeric day of the year from 001 to 366, such as 002 for January 2
  • F: Numeric day of the week in a month, such as 3 for 3rd Monday in June.
  • EEE or E: Abbreviated named day of the week, such as Sun for Sunday
  • EEEE: Named day of the week, such as Sunday
  • HH: 24 hour format from 00 to 23
  • H: 24 hour format from 0 to 23
  • hh: 12 hour format from 01 to 12
  • h: 12 hour format from 1 to 12
  • mm: minutes from 00 to 59
  • ss: seconds from 00 to 59
  • SSS: milliseconds from 000 to 999
  • a: AM or PM
  • z: time zone such as PDT

TO_DATE('31 December 2016', 'dd MMMM yyyy') returns a Date value of 2016-12-31

TO_DATE('2018/junio/17', 'yyyy/MMMM/dd', 'es-ES') returns a Date value of 2018-06-17

TO_TIMESTAMP(expr, format_string[, localeStr]) Converts an expr of VARCHAR to a value of TIMESTAMP, based on the format_string and the optional localeStr provided.

In pipeline expressions, the format_string must use the strftime format codes. Otherwise, the supported format patterns are:

  • yy: two digit year
  • yyyy: four digit year
  • M: Numeric month, such as 1 for January
  • MM: Numeric month, such as 01 for January
  • MMM: Abbreviated month, such as Jan
  • MMMM: Full month, such as January
  • d: Numeric day of the month, such as 1 for June 1
  • dd: Numeric day of the month, such as 01 for June 1
  • DDD: Numeric day of the year from 001 to 366, such as 002 for January 2
  • F: Numeric day of the week in a month, such as 3 for 3rd Monday in June.
  • EEE or E: Abbreviated named day of the week, such as Sun for Sunday
  • EEEE: Named day of the week, such as Sunday
  • HH: 24 hour format from 00 to 23
  • H: 24 hour format from 0 to 23
  • hh: 12 hour format from 01 to 12
  • h: 12 hour format from 1 to 12
  • mm: minutes from 00 to 59
  • ss: seconds from 00 to 59
  • SSS: milliseconds from 000 to 999
  • a: AM or PM
  • z: time zone such as PDT
TO_TIMESTAMP('2020-10-11 11:10:10', 'yyyy-MM-dd HH:mm:ss') returns a TIMESTAMP object representing 11am 10:10 Oct 11th, 2020
WEEK(date)

Returns the date's week value.

WEEK('2020-06-25') returns 4
YEAR(date) Returns the date's year value. YEAR('2020-06-25') returns 2020
ADD_MONTHS(date_expr, number_months) Returns the date after adding the specified number of months to the specified date, timestamp or string with a format such as yyyy-MM-dd or yyyy-MM-dd HH:mm:ss.SSS.

ADD_MONTHS('2017-07-30', 1) returns 2017-08-30

ADD_MONTHS('2017-07-30 09:07:21', 1) returns 2017-08-30

MONTHS_BETWEEN(start_date_expr, end_date_expr)

Returns the number of months between start_date_expr and end_date_expr. start_date_expr and end_date_expr can be a date, timestamp or string with a format such as yyyy-MM-dd or yyyy-MM-dd HH:mm:ss.SSS

A whole number is returned if both dates are the same day of the month, or both are the last day in their respective months. Otherwise, the difference is calculated based on 31 days per month.

MONTHS_BETWEEN('2022-01-01', '2022-01-31') returns 1

MONTHS_BETWEEN('2022-07-28', '2020-07-25') returns 24

MONTHS_BETWEEN('2022-07-28 13:24:30', '2020-07-25 13:24:30') returns 24

FROM_UTC_TIMESTAMP(time_stamp, time_zone)

Interprets a date, timestamp or string as a UTC time and converts that time to a timestamp in the specified time zone.

For string, use a format such as: yyyy-MM-dd or yyyy-MM-dd HH:mm:ss.SSS

Time zone format is either a region-based zone ID (for example, 'area/city' such as 'Asia/Seoul', or a time zone offset (for example, UTC+02).

FROM_UTC_TIMESTAMP('2017-07-14 02:40:00.0', 'GMT+1') returns 2017-07-14 03:40:00.0
TO_UTC_TIMESTAMP(time_stamp, time_zone)

Converts a date, timestamp or string in the specified time zone to a UTC timestamp.

For string, use a format such as: yyyy-MM-dd or yyyy-MM-dd HH:mm:ss.SSS

Time zone format is either a region-based zone ID (for example, 'area/city' such as 'Asia/Seoul'), or a time zone offset (for example, UTC+02).

TO_UTC_TIMESTAMP('2017-07-14 02:40:00.0', 'GMT+1') returns 2017-07-14 01:40:00.0
FROM_UNIXTIME(unix_time[, fmt])

Converts the specified Unix time or epoch to a string that represents the timestamp of that moment in the current system time zone and in the specified format.

Note: Unix time is the number of seconds that have elapsed since January 1, 1970 00:00:00 UTC.

If fmt is omitted, the default format is yyyy-MM-dd HH:mm:ss

FROM_UNIXTIME(1255033470) returns '2009-10-08 13:24:30'

FROM_UNIXTIME(1637258854) returns '2021-11-18 10:07:34'

Default time zone is PST in the examples

UNIX_TIMESTAMP([time_expr[, fmt]])

Converts the current or specified time to a Unix timestamp in seconds.

time_expr is a date, timestamp or string with a format such as yyyy-MM-dd or yyyy-MM-dd HH:mm:ss.SSS

If time_expr is not provided, the current time is converted.

If time_expr is a string and fmt is omitted, the default is yyyy-MM-dd HH:mm:ss

UNIX_TIMESTAMP('1970-01-01 00:00:00', 'yyyy-MM-dd HH:mm:ss') returns 28800

Default time zone is PST in this example

INTERVAL 'year' YEAR[(year_precision)]

Returns a period of time in years.

year_precision is the number of digits in the year field; it ranges from 0 to 9. If year_precision is omitted, the default is 2 (must be less than 100 years.)

INTERVAL '1' YEAR returns an interval of 1 year

INTERVAL '200' YEAR(3) returns an interval of 200 years

INTERVAL 'year month' YEAR[(year_precision)] TO MONTH

Returns a period of time in years and months. Use to store a period of time using year and month fields.

year_precision is the number of digits in the year field; it ranges from 0 to 9. If year_precision is omitted, the default is 2 (must be less than 100 years.)

INTERVAL '100-5' YEAR(3) TO MONTH returns an interval of 100 years, 5 months. Must must specify the leading year precision of 3.
INTERVAL 'month' MONTH[(month_precision)]

Returns a period of time in months.

month_precision is the number of digits in the month field; it ranges from 0 to 9. If month_precision is omitted, the default is 2 (must be less than 100 years.)

INTERVAL '200' MONTH(3) returns an interval of 200 months. Must specify the month precision of 3.
INTERVAL 'day time' DAY[(day_precision)] TO SECOND[(fractional_seconds_precision)]

Returns a period of time in terms of days, hours, minutes, and seconds.

day_precision is the number of digits in the day field; it ranges from 0 to 9. The default is 2.

fractional_seconds_precision is the number of digits in the fractional part of the second value in the time field; it ranges from 0 to 9.

INTERVAL '11 10:09:08.555' DAY TO SECOND(3) returns an interval of 11 days, 10 hours, 09 minutes, 08 seconds, and 555 thousandths of a second
INTERVAL 'day time' DAY[(day_precision)] TO MINUTE[(minute_precision)]

Returns a period of time in terms of days, hours, and minutes.

day_precision is the number of digits in the day field; it ranges from 0 to 9. The default is 2.

minute_precision is the number of digits in the minute field; it ranges from 0 to 2. The default is 2.

INTERVAL '11 10:09' DAY TO MINUTE returns an interval of 11 days, 10 hours, and 09 minutes
INTERVAL 'day time' DAY[(day_precision)] TO HOUR[(hour_precision)]

Returns a period of time in terms of days and hours.

day_precision is the number of digits in the day field; it ranges from 0 to 9. The default is 2.

hour_precision is the number of digits in the hour field; it ranges from 0 to 2. The default is 2.

INTERVAL '100 10' DAY(3) TO HOUR returns an interval of 100 days 10 hours
INTERVAL 'day' DAY[(day_precision)]

Returns a period of time in terms of days.

day_precision is the number of digits in the day field; it ranges from 0 to 9. The default is 2.

INTERVAL '999' DAY(3) returns an interval of 999 days
INTERVAL 'time' HOUR[(hour_precision)] TO SECOND[(fractional_seconds_precision)]

Returns a period of time in terms of hours, minutes, and seconds.

hour_precision is the number of digits in the hour field; it ranges from 0 to 2. The default is 2.

fractional_seconds_precision is the number of digits in the fractional part of the second value in the time field; it ranges from 0 to 9.

INTERVAL '09:08:07.6666666' HOUR TO SECOND(7) returns an interval of 9 hours, 08 minutes, and 7.6666666 seconds
INTERVAL 'time' HOUR[(hour_precision)] TO MINUTE[(minute_precision)]

Returns a period of time in terms of hours and minutes.

hour_precision is the number of digits in the hour field; it ranges from 0 to 2. The default is 2.

minute_precision is the number of digits in the minute field; it ranges from 0 to 2. The default is 2.

INTERVAL '09:30' HOUR TO MINUTE returns an interval of 9 hours and 30 minutes
INTERVAL 'hour' HOUR[(hour_precision)]

Returns a period of time in terms of hours.

hour_precision is the number of digits in the hour field; it ranges from 0 to 2. The default is 2.

INTERVAL '40' HOUR returns an interval of 40 hours
INTERVAL 'minute' MINUTE[(minute_precision)]

Returns a period of time in terms of minutes.

minute_precision is the number of digits in the minute field; it ranges from 0 to 2. The default is 2.

INTERVAL '15' MINUTE returns an interval of 15 minutes
INTERVAL 'time' MINUTE[(minute_precision)] TO SECOND[(fractional_seconds_precision)]

Returns a period of time in terms of minutes and seconds.

minute_precision is the number of digits in the minute field; it ranges from 0 to 2. The default is 2.

fractional_seconds_precision is the number of digits in the fractional part of the second value in the time field; it ranges from 0 to 9.

INTERVAL '15:30' MINUTE TO SECOND returns an interval of 15 minutes and 30 seconds
INTERVAL 'second' SECOND[(fractional_seconds_precision)]

Returns a period of time in terms of seconds.

fractional_seconds_precision is the number of digits in the fractional part of the second field; it ranges from 0 to 9. The default is 3.

INTERVAL '15.678' SECOND returns an interval of 15.678 seconds
Analytic Functions
Function Description Example
FIRST_VALUE(value) OVER ([ partition_clause ] order_by_clause [ windowFrame_clause ] ) Returns the value evaluated at the row that's the first row of the window frame. FIRST_VALUE(BANK_ID) OVER (PARTITION BY BANK_ID ORDER BY BANK_NAME ROWS BETWEEN CURRENT ROW AND 1 FOLLOWING) returns the first value of BANK_ID in a window over which the rows are computed as the current row and 1 row after that row, partitioned by BANK_ID and in ascending order of BANK_NAME.
LAG(value[, offset[, default]]) OVER ([ partition_clause ] order_by_clause) Returns the value evaluated at the row at a given offset before the current row within the partition. If there is no such row, the default value is returned. Both offset and default are evaluated with respect to the current row. If omitted, offset is defaulted to 1 and default to NULL. LAG(BANK_ID, 2, 'hello') OVER (PARTITION BY BANK_ID ORDER BY BANK_NAME DESC) returns the value of BANK_ID from the second row before the current row, partitioned by BANK_ID and in descending order of BANK_NAME. If there is no such value, hello is returned.
LAST_VALUE(value) OVER ([ partition_clause ] order_by_clause [ windowFrame_clause ]) Returns the value evaluated at the row that is the last row of the window frame. LAST_VALUE(BANK_ID) OVER (PARTITION BY BANK_ID ORDER BY BANK_NAME ROWS BETWEEN CURRENT ROW AND 1 FOLLOWING) returns the last value of BANK_ID in a window over which the rows are computed as the current row and 1 row after that row, partitioned by BANK_ID and in ascending order of BANK_NAME.
LEAD(value[, offset[, default]]) OVER ([ partition_clause ] order_by_clause) Returns the value evaluated at the row at a given offset after the current row within the partition. If there is no such row, the default value is returned. Both offset and default are evaluated with respect to the current row. If omitted, offset is defaulted to 1 and default to NULL. LEAD(BANK_ID, 2, 'hello') OVER (PARTITION BY BANK_ID ORDER BY BANK_NAME) returns the value of BANK_ID from the second row after the current row, partitioned by BANK_ID and in ascending order of BANK_NAME. If there is no such value, hello is returned.
RANK() OVER([ partition_clause ] order_by_clause) Returns the rank of the current row with gaps, counting from 1. RANK() OVER (PARTITION BY BANK_ID ORDER BY BANK_NAME) returns the rank of each row within the partition group of BANK_ID, in ascending order of BANK_NAME.
ROW_NUMBER() OVER([ partition_clause ] order_by_clause) Returns the unique number of the current row within its partition, counting from 1. ROW_NUMBER() OVER (PARTITION BY BANK_ID ORDER BY BANK_NAME) returns the unique row number of each row within the partition group of BANK_ID, in ascending order of BANK_NAME.
String Functions
FunctionsDescriptionExample
CAST(value AS type)Returns the specified value in the specified type.CAST("10" AS INT) returns 10
CONCAT(string, string)Returns the combined values of strings or columns.CONCAT('Oracle','SQL') returns OracleSQL
CONCAT_WS(separator, expression1, expression2, expression3,...) Returns the combined values of strings or columns using the specified separator in between the strings or columns.

A separator is required and it must be a string.

At least one expression must be provided after the separator. For example: CONCAT_WS(',' col1)

CONCAT_WS('-', 'Hello', 'Oracle') returns Hello-Oracle

CONCAT_WS(' ', address, city, postal_code) returns 123 MyCity 987654

If a child of the function is an array, then the array is flattened:

CONCAT_WS(',', 1,2,3, to_array(4,5,6), to_array(7,8), 9) returns 1,2,3,4,5,6,7,8,9

INITCAP(string)Returns the string with the first letter in each word capitalized, while all other letters are in lowercase, and each word is delimited by a white space.INITCAP('oRACLE sql') returns Oracle Sql
INSTR(string, substring[start_position])Returns the (1-based) index of the first occurrence of substring in string.INSTR('OracleSQL', 'SQL') returns 7
LOWER(string)Returns the string with all letters changed to lowercase.LOWER('ORACLE') returns oracle
LENGTH(string)Returns the character length of string or the number of bytes of binary data. The length of the string includes trailing spaces.LENGTH('Oracle') returns 6
LTRIM(string)Returns the string with leading spaces removed from the left.LTRIM(' Oracle')
NVL(expr1, epxr2)Returns the argument that is not null.NVL(EXPRESSION_3.CUSTOMERS_JSON.CONTINENT_ID, ROWID())
REGEXP_SUBSTR(string, regexp[, RegexGroupIdx])Searches and extracts the string that matches a regular expression pattern from the input string. If the optional capturing group index is provided, the function extracts the specific group.

REGEXP_SUBSTR('https://www.oracle.com/products', 'https://([[:alnum:]]+\.?){3,4}/?') returns https://www.oracle.com

REGEXP_SUBSTR('22;33;44', '([0-9.]*);([0-9.]*);([0-9.]*)', 1) returns 22
REPLACE(string, search, replacement)Replaces all occurrences of search with replacement.

If search is not found in string, then string is returned unchanged.

If replacement is not specified or is an empty string, nothing replaces the search that is removed from string.

REPLACE('ABCabc', 'abc', 'DEF') returns ABCDEF
RTRIM(string)Returns the string with leading spaces removed from the right.RTRIM('Oracle ')
SUBSTRING(string, position[, substring_length])Returns the substring starting at position. SUBSTRING('Oracle SQL' FROM 2 FOR 3) returns rac
For numbers, TO_CHAR(expr) and for dates TO_CHAR(expr, format[, locale])Converts numbers and dates into strings. For numbers, no format is required. For dates, use the same format as DATE_FORMAT described in Date and Time Functions. Default locale is en-US. See supported language tags.

In pipeline expressions, the format_string must use the strftime format codes. Otherwise, the supported date format patterns are:

  • yy: two digit year
  • yyyy: four digit year
  • M: Numeric month, such as 1 for January
  • MM: Numeric month, such as 01 for January
  • MMM: Abbreviated month, such as Jan
  • MMMM: Full month, such as January
  • d: Numeric day of the month, such as 1 for June 1
  • dd: Numeric day of the month, such as 01 for June 1
  • DDD: Numeric day of the year from 001 to 366, such as 002 for January 2
  • F: Numeric day of the week in a month, such as 3 for 3rd Monday in June.
  • EEE or E: Abbreviated named day of the week, such as Sun for Sunday
  • EEEE: Named day of the week, such as Sunday
  • HH: 24 hour format from 00 to 23
  • H: 24 hour format from 0 to 23
  • hh: 12 hour format from 01 to 12
  • h: 12 hour format from 1 to 12
  • mm: minutes from 00 to 59
  • ss: seconds from 00 to 59
  • SSS: milliseconds from 000 to 999
  • a: AM or PM
  • z: time zone such as PDT

Number example: TO_CHAR(123) returns 123

Date example: TO_CHAR(Date '2020-10-30', 'yyyy.MM.dd', 'en-US') returns the string 2020.10.30. The first argument is a Date object representing Oct 30th, 2020.

UPPER(string)Returns a string with all letters changed to uppercase.UPPER('oracle') returns ORACLE
LPAD(str, len[, pad])Returns a string that is left-padded with specified characters to a certain length. If the pad character is omitted, the default is a space.LPAD('ABC', 5, '*') returns '**ABC'
RPAD(str, len[, pad])Returns a string that is right-padded with specified characters to a certain length. If the pad character is omitted, the default is a space.RPAD('XYZ', 6, '+' ) returns 'XYZ+++'
Operator (Comparison) Functions
FunctionDescriptionExample
CASE WHEN condition1 THEN result1 ELSE result2 ENDReturns the value for which a condition is met.CASE WHEN 1 > 0 THEN 'ABC' ELSE 'XYZ' END returns ABC if 1> 0, otherwise returns XYZ
ANDThe logical AND operator. Returns true if both operands are true, otherwise returns false.(x = 10 AND y = 20) returns "true" if x is equal to 10 and y is equal to 20. If either one is not true, then it returns "false"
ORThe logical OR operator. Returns true if either operand is true or both are true, otherwise returns false.(x = 10 OR y = 20) returns "false" if x is not equal to 10 and also y is not equal to 20. If either one is true, then it returns "true"
NOTThe logical NOT operator.
LIKEPerforms string pattern matching, whether string1 matches the pattern in string2.
=Tests for equality. Returns true if expr1 equals expr2, otherwise returns false.x = 10 returns "true" when value of x is 10, else it returns "false"
!=Tests for inequality. Returns true if expr1 does not equal to expr2, otherwise returns false.x != 10 returns "false" if value of x is 10, else it returns "true"
>Tests for an expression greater than. Returns true if expr1 is greater than expr2.x > 10 returns "true" if value of x is greater than 10, else it returns "false"
>=Tests for an expression greater than or equal to. Returns true if expr1 is greater than or equal to expr2.x > =10 returns "true" if value of x is greater than or equal to 10, else it returns "false"
<Tests for an expression less than. Returns true if expr1 is less than expr2.x < 10 returns "true" if value of x is less than 10, else it returns "false"
<=Tests for an expression less than or equal to. Returns true if expr1 is less than or equal to expr2.x <= 10 returns "true" if value of x is less than 10, else it returns "false"
||Concatenates two strings.'XYZ' || 'hello' returns 'XYZhello'
BETWEENEvaluates a range.FILTER_1.BANK.BANK_ID BETWEEN 1003 AND 1007
INTests whether an expression matches a list of values.FILTER_2.ORDERS.ORDER_ID IN (1003, 1007)
Unique ID Functions
FunctionDescriptionExample
NUMERIC_ID()Generates a universally unique identifier that is a 64-bit number for each row.NUMERIC_ID() returns for example, 3458761969522180096 and 3458762008176885761
ROWID()Generates monotonically increasing 64-bit numbers.ROWID() returns for example, 0, 1, 2, and so on
UUID()Generates a universally unique identifier that is a 128-bit String for each row.UUID() returns for example, 20d45c2f-0d56-4356-8910-162f4f40fb6d
MONOTONICALLY_INCREASING_ID() Generates unique, monotonically increasing 64-bit integers that are not consecutive numbers. MONOTONICALLY_INCREASING_ID() returns for example, 8589934592 and 25769803776
Conditional Functions
FunctionDescriptionExample
COALESCE(value, value [, value]*)Returns the first non-null argument, if it exists, otherwise returns null.COALESCE(NULL, 1, NULL) returns 1
NULLIF(value, value)Returns null if the two values equal each other, otherwise returns the first value.NULLIF('ABC','XYZ') returns ABC
Hierarchical Functions
FunctionDescriptionExample
SCHEMA_OF_JSON(string)Parses a JSON string and infers the schema in DDL format.

SCHEMA_OF_JSON('[{\"Zipcode\":704,\"ZipCodeType\":\"STANDARD\",\"City\":\"ORACLECITY\",\"State\":\"OC\"}]') returns 'ARRAY<STRUCT<City:string,State:string,ZipCodeType:string,Zipcode:bigint>>'

SCHEMA_OF_JSON('[{\"col\":0}]') returns 'ARRAY<STRUCT<col: BIGINT>>'

FROM_JSON(column, string)

Parses a column containing a JSON string into one of the following types, with the specified schema.

  • Map, with String as the key type
  • Struct
  • Array

FROM_JSON('{\"Zipcode\":704,\"City\":\"ORACLE CITY\"}', 'STRUCT<Zipcode: BIGINT, City: STRING>') returns a Struct type column with the specified schema: {704, ORACLE CITY}

FROM_JSON('{\"a\":1, \"b\":0.8}', 'STRUCT<a: BIGINT, b: DOUBLE>') returns a Struct type column with the specified schema: {1, 0.8}

TO_JSON(column)Converts a column containing a type of Struct or Array of Structs, or a Map or Array of Map, into a JSON string.TO_JSON(TO_STRUCT('s1', TO_ARRAY(1,2,3), 's2', TO_MAP('key', 'value'))) returns a JSON string {"s1":[1,2,3],"s2":{"key":"value"}}
TO_MAP(string,column[,string,column]*)Creates a new column of Map type. The input columns must be grouped as key-value pairs. The input key columns cannot be null, and must all have the same data type. The input value columns must all have the same data type.

TO_MAP('Ename',Expression_1.attribute1) returns a Map type column: {"ENAME" -> 100}

TO_MAP('block', EXPRESSION_1.MYSOURCE.address.block, 'unit', EXPRESSION_1.MYSOURCE.address.unit) returns a Map type column: {"block" -> 1,"unit" -> 1}

TO_STRUCT(string,column[,string,column]*)Creates a new column of Struct type. The input columns must be grouped as key-value pairs.

TO_STRUCT('Ename',Expression_1.attribute1) returns {100}

TO_STRUCT('Id',Expression_1.attribute1, 'Name', Expression_1.attribute2) returns {100, "John"}

TO_ARRAY(column[,column]*)Creates a new column as Array type. The input columns must all have the same data type.

TO_Array(Expression_1.attribute1) returns [100]

TO_ARRAY(EXPRESSION_1.attribute2,EXPRESSION_1.attribute3) returns ["John","Friend"]

Higher-Order Functions

Data flow operators that support creating expressions and hierarchical data types can use higher-order functions.

The supported operators are:

  • Aggregate

  • Expression

  • Filter

  • Join

  • Lookup

  • Split

  • Pivot

FunctionDescriptionExample
TRANSFORM(column, lambda_function)Takes an array and an anonymous function, and sets up a new array by applying the function to each element, and assigning the result to the output array.For an input array of integers [1, 2, 3], TRANSFORM(array, x -> x + 1) returns a new array of [2, 3, 4].
TRANSFORM_KEYS(column, lambda_function)Takes a map and a function with 2 arguments (key and value), and returns a map in which the keys have the type of the result of the lambda function, and the values have the type of the column map values.For an input map with integer keys and string values of {1 -> 'value1', 2 -> 'value2', 3 -> 'value3'}, TRANSFORM_KEYS(map, (k, v) -> k * 2 + 1) returns a new map of {3 -> 'value1', 5 -> 'value2', 7 -> 'value3'}.
TRANSFORM_VALUES(column, lambda_function)Takes a map and a function with 2 arguments (key and value), and returns a map in which the values have the type of the result of the lambda functions, and the keys have the type of the column map keys. For an input map with string keys and string values of {'a' -> 'value1', 'b' -> 'value2', 'c' -> 'value3'}, TRANSFORM_VALUES(map, (k, v) -> k || __ '' || v) returns a new map of {'a' -> 'a_value1', 'b' -> 'b_value2', 'c' -> 'c_value3'}.
ARRAY_SORT(array(...), lambda_function)

Only the Expression operator supports ARRAY_SORT.

Takes an array and sorts according to the given function that takes 2 arguments.

The function must return -1, 0, or 1 depending on whether the first element is less than, equal to, or greater than the second element.

If the function is omitted, the array is sorted in ascending order.

array_sort(to_array(5, 6, 1),
                   (left, right) -> CASE WHEN left < right THEN -1
                                         WHEN left > right THEN 1 ELSE 0 END)

The returned array is:

[1,5,6]

Expression Operator

Use the expression operator to perform one or more transformations on a single row of data to create new, derivative fields.

To use the Expression operator to change the data type of a group of attributes, see Changing the data type of a group of attributes.

Adding and configuring an expression operator
  1. From the Operators panel, drag an Expression operator onto the canvas.
  2. With the expression operator in focus, on the Details tab of the Properties panel, enter a name in the Identifier field, or leave the name as-is.
  3. Under Expressions, click Add expression.
  4. In the Add Expression panel, enter a name for the expression in the Identifier field, or leave the name as-is.
  5. (Optional) To apply the expression to two or more attributes, select Allow bulk selection. Then select the attributes by name or by using a regular expression pattern. For example, suppose you have three name attributes in the source dataset and you want to apply the UPPER function to all three source attributes. You can use a pattern to select this group of source attributes.
    1. For Source attributes, select Pattern and click Edit pattern. Then enter the regular expression that matches a group of source attributes. For example, enter *NAME to match attributes FIRST_NAME, LAST_NAME, and FULL_NAME.
    2. Select a data type from the menu.
    3. For Target attributes, enter the regular expression to define the target attribute names in Pattern. By default, $0 is entered, which refers to the original source attribute names. Update the pattern to name the target attributes according to requirements. For example, enter $0_UPPER to append the text UPPER to the original source attribute names.
    4. By default, Use source attribute data types is selected. You can retain the selection. If you clear the Use source attribute data types checkbox, then select a Data type and complete the fields corresponding to the type selected.
  6. (Optional) If you did not select Allow bulk selection, under Expression data type, you can select Infer data type if you want the expression builder to detect the data type from the expression that you enter. You can click Preview data type and Validate to preview and refresh the data type, and validate the expression.

    Infer data type is useful when you're working with complex data types such as Map, Array, and Struct that might have levels of nested types.

  7. If you did not select Allow bulk selection and Infer data type, select a value from the Data type menu, and then complete the fields corresponding to the type selected.
  8. In the Expression builder section, visually construct an expression by double-clicking or dragging incoming attributes, parameters, or functions to add to the editor to build the expression. Or you can manually write the expression yourself. See Adding an expression. You can also validate the expression.

    You can use parameters in expressions, including user-defined parameters that have been created in the data flow, and system parameters generated by Data Integration at runtime.

    If you selected Allow bulk selection, then use %MACRO_INPUT% in the expression to indicate the attributes on which the function is to be applied. For example, suppose you used the pattern *NAME to match the source attributes FIRST_NAME, LAST_NAME, and FULL_NAME. Now you can specify a function as UPPER(%MACRO_INPUT%) to apply the function on all the attributes that match the pattern.

    To exclude one or more incoming attributes from the output of this expression, select the Exclude incoming attributes checkbox. Then use the menu to add the incoming attributes that you want to exclude from the output. An Exclude rule is applied to each attribute that you select to exclude. You can exclude attributes only when you're adding the expression the first time. When you edit the expression, the Exclude incoming attributes checkbox is unavailable.

    Note

    To replace a placeholder in a function with another element, highlight the placeholder and then double-click another element from the list to add it to the function.
  9. Click Add.
    You can repeat the steps from step 3 to add more expressions, if needed.
  10. Under the Attributes tab, select to view the incoming attributes or output attributes. Incoming attributes link to the operator on the left side. Output attributes link to the operator on the right side and go into the next operator.

    You can select individual attributes or use the filter icons to filter the list of attributes, and then apply bulk exclude rules to the selected or the filtered attributes.

    • Click the filter icon in the Name column to filter attributes by a name pattern. In the filter field, enter a simple regular expression. You can use wildcards such as ? and * in the regex pattern.
    • Click the filter icon in the Type column to filter attributes by a data type. Use the menu to select the type to use as a filter.
      Note

      You can apply only one name pattern filter, but multiple type filters at a time. For example, to filter by the name pattern *_CODE and by the type numeric or varchar, you apply one name pattern filter (*_CODE) and two type filters (numeric, varchar).
    • Use the Actions menu to apply bulk exclude rules to the selected attributes or filtered attributes. Select Exclude by selection or Exclude by applied filter.

      To change the data type of a group of attributes, select Change data type. In the dialog that appear, select the new Data type for the selected or filtered attributes. Then select any other properties that are applicable to the new data type you have selected. When you click Apply, an expression is added to the Details tab instead of a rule.

    • Click View rules to open the Rules panel. You can view and manage the rules that have been applied to the data entity. By default, the first rule in the Rules panel includes everything.

    A complex type is shown as ARRAY (data type), COMPOSITE, or MAP (key type, value type). See Hierarchical Data Types to understand what is supported for complex types.

  11. In the Data tab, view a sampling of the data, based on the configuration of the operator and any rules you have applied in the Attributes tab.

    The Data Profile does not appear for complex data attributes. For complex data attributes, to view the hierarchy of the data structure in a simplified structure, click the complex data shown. For example, [...] or {…}.

    You can apply transformations to individual attributes or do a bulk transformation on a group of attributes.

    For entities with hierarchical data types, see Hierarchical Data Types to understand what is supported.

    See also Adding an expression.

  12. In the Validation tab, check for warnings or errors that might cause the data flow to fail.
Changing the data type of a group of attributes

You can use the Expression operator to change the data type of attributes using a bulk transformation action.

Instead of using a CAST function to change an attribute data type one attribute at a time, you can add the Expression operator and apply a bulk transformation action on several attributes at the same time.
  1. Connect the expression operator with the appropriate incoming object.
  2. With the expression operator in focus on the canvas, under the Attributes tab of the Properties panel, select to view the incoming attributes (linked to the expression operator on the left side).
  3. If you have a few attributes to transform, use the checkboxes to select the attributes you want to transform.
  4. If you have many attributes to transform, apply a filter on the attributes:
    1. Click the filter icon in the Name column. In the filter field that's displayed, enter a simple regex pattern using wildcards (? and *) to filter the attributes by a name pattern. For example, enter ADD* to match attributes ADDRESS1, ADDRESS2, and ADDRESS_KEY.
    2. Click the filter icon in the Type column. In the menu that's displayed, select the current attribute type of the attributes you want to change. For example, suppose the type of ADDRESS1 and ADDRESS2 is VARCHAR, and the type of ADDRESS_KEY is NUMERIC. You can select VARCHAR to transform ADDRESS1 and ADDRESS2 only.
  5. From the Actions menu, select Change data type.
  6. In the Change data type dialog, select the new Data type for the selected attributes or the filter-applied attributes.

    Then specify any other properties that are applicable to the new data type you have selected.

    To keep the original attributes in the dataset, select Keep source attributes. If the checkbox is not selected, only the new attributes are available in the resulting data.

  7. Click Apply.
  8. Select the Details tab.
    Review the expression that's added to the operator for you.
Adding an expression

Use the Expression Builder to visually select elements to build an expression in the editor. You can also manually write an expression yourself.

Take note of the following when you create expressions:

  • Enclose a string literal within single quotation marks. For example: CONCAT('We ', 'like') and UPPER('oracle').

  • Enclose an attribute name within double quotation marks. For example: UPPER("Sales") and CONCAT(CONCAT("EXPRESSION_1.EMP_UTF8_EN_COL_CSV.EN_NAME", ' '), "EXPRESSION_1.EMP_UTF8_EN_COL_CSV.NAME").

    Using double quotation marks around attribute names is mandatory for multibyte characters, and names that have special characters in the fully qualified name.

The Add Expression panel has two sections: Expression information and Expression builder. The Expression information fields let you specify a name and data type for the expression. You can also create the expression to apply to two or more attributes. When working with complex data types such as Map, Array, and Struct that might have levels of nested types, you can choose to let the builder detect the data type from the expression that you enter. When you let the builder infer the data type, you can preview and refresh the data type, and validate the expression.

The Expression builder section lists the elements for building an expression. The elements that you can use in an expression include incoming attributes, parameters, and functions. Double-click or drag an element from the list to add to the editor to build the expression, or manually write the expression yourself. You can validate the expression before creating it.

Incoming displays the attributes from the upstream operator that are coming into this expression operator. Below the attributes list is a checkbox that lets you apply Exclude rules. To exclude one or more incoming attributes from the output of this expression, select the Exclude incoming attributes checkbox. Then use the menu to add the incoming attributes that you want to exclude from the output. An Exclude rule is applied to each attribute that you select to exclude. You can exclude attributes only when you're adding the expression the first time. When you edit the expression, the Exclude incoming attributes checkbox is unavailable.

Parameters include user-defined parameters and system-generated parameters.

User-defined parameters are the expression parameters that have been added in the data flow using the Condition Builder (filter, join, lookup, and split operators) or Expression Builder (expression and aggregate operators). See Adding an Expression Parameter. The syntax is $PARAMETER_NAME. For example: EXPRESSION_1.ADDRESSES.POSTAL_CODE=$P_CODE

Data Integration generates system parameters such as SYS.TASK_START_TIME. The values of system parameters can be used in expressions to log system information. The syntax is ${SYSTEM_PARAMETER}. For example: ${SYS.TASK_RUN_NAME}

Functions are the functions available in Data Integration that you can use in an expression. Functions are operations performed on arguments passed to the function. Functions calculate, manipulate, or extract data values from arguments. You can also add user defined functions that you have created in the workspace. For example: MYLIBRARY.MYFUNCTION

The list of functions available in Data Integration for you to use is as follows:

Hash Functions
FunctionDescriptionExample
MD5(all data types)Calculates an MD5 checksum of the data type, and returns a string value. MD5(column_name)
SHA1(all data types)Calculates a SHA-1 hash value of the data type, and returns a string value. SHA1(column_name)
SHA2(all data types, bitLength)Calculates a SHA-2 hash value of the data type, and returns a string value. bitLength is an integer. SHA2 (column_name, bitLength can be set to 0 (equivalent to 256), 256, 384, or 512).
ORA_HASH(expr, [max_bucket], [seed_value])

Computes a hash value for expr, and returns a NUMBER value.

expr can be an expression, a column, a literal.

max_bucket is the maximum bucket value returned, between 0 and 4294967295 (default).

seed_value is a value between 0 (default) and 4294967295.

Oracle applies the hash function to the combination of expr and seed_value to produce many different results for the same set of data.

ORA_HASH('1')

ORA_HASH('b', 2)

ORA_HASH(100, 10, 10)

ORA_HASH(EXPRESSION_1.CUSTOMERS.SSN, 2)

Arithmetic Functions
FunctionDescriptionExample
ABS(numeric)Returns the absolute power of the numeric value.ABS(-1)
CEIL(numeric)Returns the smallest integer not greater than the numeric valueCEIL(-1,2)
FLOOR(numeric)Returns the largest integer not greater than the numeric value.FLOOR(-1,2)
MOD(numeric1, numeric2)Returns the remainder after numeric1 is divided by numeric2.MOD(8,2)
POWER(numeric1, numeric2)Raises numeric1 to the power of numeric2.POWER(2,3)
ROUND(numeric1, numeric2)Returns numeric1 rounded to numeric2 decimal places.ROUND(2.5,0)
TRUNC(numeric1, numeric2)Returns numeric1 truncated to numeric2 decimal places.TRUNC(2.5,0)
TO_NUMBER(expr[, format, locale])Converts an expr to a number, based on the format and optional locale provided. Default locale is en-US. Supported language tags.

Supported format patterns:

  • 0: A digit
  • #: A digit, zero shows as absent
  • .: Placeholder for decimal separator
  • ,: Placeholder for grouping separator
  • E: Separates mantissa and exponent for exponential formats
  • -: Default negative prefix
  • ¤: Currency sign; replaced by currency symbol; if doubled, replaced by international currency symbol; if present in a pattern, the monetary decimal separator is used instead of the decimal separator

TO_NUMBER('5467.12') returns returns 5467.12

TO_NUMBER('-USD45,677.7', '¤¤##,###.#', 'en-US') returns -45677.7

Array Functions

Only the Expression operator supports array functions.

FunctionDescriptionExample
ARRAY_POSITION(array(...), element)Returns the position of the first occurrence of the given element in the given array.

The position is not zero based, instead it starts with 1.

ARRAY_POSITION(array(3, 2, 1, 4, 1), 1) returns 3
REVERSE(array(...)) Returns the given array of elements in a reverse order. REVERSE(array(2, 1, 4, 3)) returns [3,4,1,2]
ELEMENT_AT(array(...), index) Returns the element of the given array at the given index position.

The index is not zero based, instead it starts with 1.

If index = -1, then it returns the last element.

ELEMENT_AT(array(1, 2, 3), 2) returns 2
Date and Time Functions
Function Description Example
CURRENT_DATE Returns the current date. CURRENT_DATEreturns today's date such as 2023-05-26
CURRENT_TIMESTAMP Returns the current date and time for the session time zone. CURRENT_TIMESTAMP returns today's date and current time such as 2023-05-26 12:34:56
DATE_ADD(date, number_of_days) Returns the date that's the specified number of days after the specified date. DATE_ADD('2017-07-30', 1) returns 2017-07-31
DATE_FORMAT(expr, format[, locale])

Formats an expr of Date, based on the format and optional locale provided. Default locale is en-US. Supported language tags.

Supported date format patterns:

  • yy: two digit year
  • yyyy: four digit year
  • M: Numeric month, such as 1 for January
  • MM: Numeric month, such as 01 for January
  • MMM: Abbreviated month, such as Jan
  • MMMM: Full month, such as January
  • d: Numeric day of the month, such as 1 for June 1
  • dd: Numeric day of the month, such as 01 for June 1
  • DDD: Numeric day of the year from 001 to 366, such as 002 for January 2
  • F: Numeric day of the week in a month, such as 3 for 3rd Monday in June.
  • EEE or E: Abbreviated named day of the week, such as Sun for Sunday
  • EEEE: Named day of the week, such as Sunday
  • HH: 24 hour format from 00 to 23
  • H: 24 hour format from 0 to 23
  • hh: 12 hour format from 01 to 12
  • h: 12 hour format from 1 to 12
  • mm: minutes from 00 to 59
  • ss: seconds from 00 to 59
  • SSS: milliseconds from 000 to 999
  • a: AM or PM
  • z: time zone such as PDT

DATE_FORMAT(Date '2020-10-11', 'yyyy-MM-dd') returns '2020-10-11'. The first argument is a Date object representing October 11th, 2020.

DATE_FORMAT(Date '2018-junio-17', 'yyyy/MMMM/dd', 'es-ES') returns '2018/junio/17'

DAYOFMONTH(date) Returns the date's day in the month. DAYOFMONTH('2020-12-25') returns 25
DAYOFWEEK(date) Returns the date's day in the week. DAYOFWEEK('2020-12-25') returns 6 for Friday. In the United States, Sunday is considered to be 1, Monday is 2, and so on.
DAYOFYEAR(date) Returns the date's day in the year. DAYOFYEAR('2020-12-25') returns 360
WEEKOFYEAR(date) Returns the date's week in the year.

WEEKOFYEAR('2022-07-28') returns 30

WEEKOFYEAR('2022-07-28 13:24:30') returns 30

HOUR(datetime) Returns the datetime's hour value. HOUR('2020-12-25 15:10:30') returns 15
LAST_DAY(date) Returns the date's last day of the month. LAST_DAY('2020-12-25') returns 31
MINUTE(datetime) Returns the datetime's minute value. HOUR('2020-12-25 15:10:30') returns 10
MONTH(date) Returns the date's month value. MONTH('2020-06-25') returns 6
QUARTER(date) Returns the quarter of year the date is in. QUARTER('2020-12-25') returns 4
SECOND(datetime) Returns the datetime's second value. SECOND('2020-12-25 15:10:30') returns 30
TO_DATE(string, format_string[, localeStr]) Parses the string expression with the format_string expression to a date. Locale is optional. Default is en-US. Supported language tags.

In pipeline expressions, the format_string must use the strftime format codes. Otherwise, the supported case-sensitive format strings are:

  • yy: two digit year
  • yyyy: four digit year
  • M: Numeric month, such as 1 for January
  • MM: Numeric month, such as 01 for January
  • MMM: Abbreviated month, such as Jan
  • MMMM: Full month, such as January
  • d: Numeric day of the month, such as 1 for June 1
  • dd: Numeric day of the month, such as 01 for June 1
  • DDD: Numeric day of the year from 001 to 366, such as 002 for January 2
  • F: Numeric day of the week in a month, such as 3 for 3rd Monday in June.
  • EEE or E: Abbreviated named day of the week, such as Sun for Sunday
  • EEEE: Named day of the week, such as Sunday
  • HH: 24 hour format from 00 to 23
  • H: 24 hour format from 0 to 23
  • hh: 12 hour format from 01 to 12
  • h: 12 hour format from 1 to 12
  • mm: minutes from 00 to 59
  • ss: seconds from 00 to 59
  • SSS: milliseconds from 000 to 999
  • a: AM or PM
  • z: time zone such as PDT

TO_DATE('31 December 2016', 'dd MMMM yyyy') returns a Date value of 2016-12-31

TO_DATE('2018/junio/17', 'yyyy/MMMM/dd', 'es-ES') returns a Date value of 2018-06-17

TO_TIMESTAMP(expr, format_string[, localeStr]) Converts an expr of VARCHAR to a value of TIMESTAMP, based on the format_string and the optional localeStr provided.

In pipeline expressions, the format_string must use the strftime format codes. Otherwise, the supported format patterns are:

  • yy: two digit year
  • yyyy: four digit year
  • M: Numeric month, such as 1 for January
  • MM: Numeric month, such as 01 for January
  • MMM: Abbreviated month, such as Jan
  • MMMM: Full month, such as January
  • d: Numeric day of the month, such as 1 for June 1
  • dd: Numeric day of the month, such as 01 for June 1
  • DDD: Numeric day of the year from 001 to 366, such as 002 for January 2
  • F: Numeric day of the week in a month, such as 3 for 3rd Monday in June.
  • EEE or E: Abbreviated named day of the week, such as Sun for Sunday
  • EEEE: Named day of the week, such as Sunday
  • HH: 24 hour format from 00 to 23
  • H: 24 hour format from 0 to 23
  • hh: 12 hour format from 01 to 12
  • h: 12 hour format from 1 to 12
  • mm: minutes from 00 to 59
  • ss: seconds from 00 to 59
  • SSS: milliseconds from 000 to 999
  • a: AM or PM
  • z: time zone such as PDT
TO_TIMESTAMP('2020-10-11 11:10:10', 'yyyy-MM-dd HH:mm:ss') returns a TIMESTAMP object representing 11am 10:10 Oct 11th, 2020
WEEK(date)

Returns the date's week value.

WEEK('2020-06-25') returns 4
YEAR(date) Returns the date's year value. YEAR('2020-06-25') returns 2020
ADD_MONTHS(date_expr, number_months) Returns the date after adding the specified number of months to the specified date, timestamp or string with a format such as yyyy-MM-dd or yyyy-MM-dd HH:mm:ss.SSS.

ADD_MONTHS('2017-07-30', 1) returns 2017-08-30

ADD_MONTHS('2017-07-30 09:07:21', 1) returns 2017-08-30

MONTHS_BETWEEN(start_date_expr, end_date_expr)

Returns the number of months between start_date_expr and end_date_expr. start_date_expr and end_date_expr can be a date, timestamp or string with a format such as yyyy-MM-dd or yyyy-MM-dd HH:mm:ss.SSS

A whole number is returned if both dates are the same day of the month, or both are the last day in their respective months. Otherwise, the difference is calculated based on 31 days per month.

MONTHS_BETWEEN('2022-01-01', '2022-01-31') returns 1

MONTHS_BETWEEN('2022-07-28', '2020-07-25') returns 24

MONTHS_BETWEEN('2022-07-28 13:24:30', '2020-07-25 13:24:30') returns 24

FROM_UTC_TIMESTAMP(time_stamp, time_zone)

Interprets a date, timestamp or string as a UTC time and converts that time to a timestamp in the specified time zone.

For string, use a format such as: yyyy-MM-dd or yyyy-MM-dd HH:mm:ss.SSS

Time zone format is either a region-based zone ID (for example, 'area/city' such as 'Asia/Seoul', or a time zone offset (for example, UTC+02).

FROM_UTC_TIMESTAMP('2017-07-14 02:40:00.0', 'GMT+1') returns 2017-07-14 03:40:00.0
TO_UTC_TIMESTAMP(time_stamp, time_zone)

Converts a date, timestamp or string in the specified time zone to a UTC timestamp.

For string, use a format such as: yyyy-MM-dd or yyyy-MM-dd HH:mm:ss.SSS

Time zone format is either a region-based zone ID (for example, 'area/city' such as 'Asia/Seoul'), or a time zone offset (for example, UTC+02).

TO_UTC_TIMESTAMP('2017-07-14 02:40:00.0', 'GMT+1') returns 2017-07-14 01:40:00.0
FROM_UNIXTIME(unix_time[, fmt])

Converts the specified Unix time or epoch to a string that represents the timestamp of that moment in the current system time zone and in the specified format.

Note: Unix time is the number of seconds that have elapsed since January 1, 1970 00:00:00 UTC.

If fmt is omitted, the default format is yyyy-MM-dd HH:mm:ss

FROM_UNIXTIME(1255033470) returns '2009-10-08 13:24:30'

FROM_UNIXTIME(1637258854) returns '2021-11-18 10:07:34'

Default time zone is PST in the examples

UNIX_TIMESTAMP([time_expr[, fmt]])

Converts the current or specified time to a Unix timestamp in seconds.

time_expr is a date, timestamp or string with a format such as yyyy-MM-dd or yyyy-MM-dd HH:mm:ss.SSS

If time_expr is not provided, the current time is converted.

If time_expr is a string and fmt is omitted, the default is yyyy-MM-dd HH:mm:ss

UNIX_TIMESTAMP('1970-01-01 00:00:00', 'yyyy-MM-dd HH:mm:ss') returns 28800

Default time zone is PST in this example

INTERVAL 'year' YEAR[(year_precision)]

Returns a period of time in years.

year_precision is the number of digits in the year field; it ranges from 0 to 9. If year_precision is omitted, the default is 2 (must be less than 100 years.)

INTERVAL '1' YEAR returns an interval of 1 year

INTERVAL '200' YEAR(3) returns an interval of 200 years

INTERVAL 'year month' YEAR[(year_precision)] TO MONTH

Returns a period of time in years and months. Use to store a period of time using year and month fields.

year_precision is the number of digits in the year field; it ranges from 0 to 9. If year_precision is omitted, the default is 2 (must be less than 100 years.)

INTERVAL '100-5' YEAR(3) TO MONTH returns an interval of 100 years, 5 months. Must must specify the leading year precision of 3.
INTERVAL 'month' MONTH[(month_precision)]

Returns a period of time in months.

month_precision is the number of digits in the month field; it ranges from 0 to 9. If month_precision is omitted, the default is 2 (must be less than 100 years.)

INTERVAL '200' MONTH(3) returns an interval of 200 months. Must specify the month precision of 3.
INTERVAL 'day time' DAY[(day_precision)] TO SECOND[(fractional_seconds_precision)]

Returns a period of time in terms of days, hours, minutes, and seconds.

day_precision is the number of digits in the day field; it ranges from 0 to 9. The default is 2.

fractional_seconds_precision is the number of digits in the fractional part of the second value in the time field; it ranges from 0 to 9.

INTERVAL '11 10:09:08.555' DAY TO SECOND(3) returns an interval of 11 days, 10 hours, 09 minutes, 08 seconds, and 555 thousandths of a second
INTERVAL 'day time' DAY[(day_precision)] TO MINUTE[(minute_precision)]

Returns a period of time in terms of days, hours, and minutes.

day_precision is the number of digits in the day field; it ranges from 0 to 9. The default is 2.

minute_precision is the number of digits in the minute field; it ranges from 0 to 2. The default is 2.

INTERVAL '11 10:09' DAY TO MINUTE returns an interval of 11 days, 10 hours, and 09 minutes
INTERVAL 'day time' DAY[(day_precision)] TO HOUR[(hour_precision)]

Returns a period of time in terms of days and hours.

day_precision is the number of digits in the day field; it ranges from 0 to 9. The default is 2.

hour_precision is the number of digits in the hour field; it ranges from 0 to 2. The default is 2.

INTERVAL '100 10' DAY(3) TO HOUR returns an interval of 100 days 10 hours
INTERVAL 'day' DAY[(day_precision)]

Returns a period of time in terms of days.

day_precision is the number of digits in the day field; it ranges from 0 to 9. The default is 2.

INTERVAL '999' DAY(3) returns an interval of 999 days
INTERVAL 'time' HOUR[(hour_precision)] TO SECOND[(fractional_seconds_precision)]

Returns a period of time in terms of hours, minutes, and seconds.

hour_precision is the number of digits in the hour field; it ranges from 0 to 2. The default is 2.

fractional_seconds_precision is the number of digits in the fractional part of the second value in the time field; it ranges from 0 to 9.

INTERVAL '09:08:07.6666666' HOUR TO SECOND(7) returns an interval of 9 hours, 08 minutes, and 7.6666666 seconds
INTERVAL 'time' HOUR[(hour_precision)] TO MINUTE[(minute_precision)]

Returns a period of time in terms of hours and minutes.

hour_precision is the number of digits in the hour field; it ranges from 0 to 2. The default is 2.

minute_precision is the number of digits in the minute field; it ranges from 0 to 2. The default is 2.

INTERVAL '09:30' HOUR TO MINUTE returns an interval of 9 hours and 30 minutes
INTERVAL 'hour' HOUR[(hour_precision)]

Returns a period of time in terms of hours.

hour_precision is the number of digits in the hour field; it ranges from 0 to 2. The default is 2.

INTERVAL '40' HOUR returns an interval of 40 hours
INTERVAL 'minute' MINUTE[(minute_precision)]

Returns a period of time in terms of minutes.

minute_precision is the number of digits in the minute field; it ranges from 0 to 2. The default is 2.

INTERVAL '15' MINUTE returns an interval of 15 minutes
INTERVAL 'time' MINUTE[(minute_precision)] TO SECOND[(fractional_seconds_precision)]

Returns a period of time in terms of minutes and seconds.

minute_precision is the number of digits in the minute field; it ranges from 0 to 2. The default is 2.

fractional_seconds_precision is the number of digits in the fractional part of the second value in the time field; it ranges from 0 to 9.

INTERVAL '15:30' MINUTE TO SECOND returns an interval of 15 minutes and 30 seconds
INTERVAL 'second' SECOND[(fractional_seconds_precision)]

Returns a period of time in terms of seconds.

fractional_seconds_precision is the number of digits in the fractional part of the second field; it ranges from 0 to 9. The default is 3.

INTERVAL '15.678' SECOND returns an interval of 15.678 seconds
Analytic Functions
Function Description Example
FIRST_VALUE(value) OVER ([ partition_clause ] order_by_clause [ windowFrame_clause ] ) Returns the value evaluated at the row that's the first row of the window frame. FIRST_VALUE(BANK_ID) OVER (PARTITION BY BANK_ID ORDER BY BANK_NAME ROWS BETWEEN CURRENT ROW AND 1 FOLLOWING) returns the first value of BANK_ID in a window over which the rows are computed as the current row and 1 row after that row, partitioned by BANK_ID and in ascending order of BANK_NAME.
LAG(value[, offset[, default]]) OVER ([ partition_clause ] order_by_clause) Returns the value evaluated at the row at a given offset before the current row within the partition. If there is no such row, the default value is returned. Both offset and default are evaluated with respect to the current row. If omitted, offset is defaulted to 1 and default to NULL. LAG(BANK_ID, 2, 'hello') OVER (PARTITION BY BANK_ID ORDER BY BANK_NAME DESC) returns the value of BANK_ID from the second row before the current row, partitioned by BANK_ID and in descending order of BANK_NAME. If there is no such value, hello is returned.
LAST_VALUE(value) OVER ([ partition_clause ] order_by_clause [ windowFrame_clause ]) Returns the value evaluated at the row that is the last row of the window frame. LAST_VALUE(BANK_ID) OVER (PARTITION BY BANK_ID ORDER BY BANK_NAME ROWS BETWEEN CURRENT ROW AND 1 FOLLOWING) returns the last value of BANK_ID in a window over which the rows are computed as the current row and 1 row after that row, partitioned by BANK_ID and in ascending order of BANK_NAME.
LEAD(value[, offset[, default]]) OVER ([ partition_clause ] order_by_clause) Returns the value evaluated at the row at a given offset after the current row within the partition. If there is no such row, the default value is returned. Both offset and default are evaluated with respect to the current row. If omitted, offset is defaulted to 1 and default to NULL. LEAD(BANK_ID, 2, 'hello') OVER (PARTITION BY BANK_ID ORDER BY BANK_NAME) returns the value of BANK_ID from the second row after the current row, partitioned by BANK_ID and in ascending order of BANK_NAME. If there is no such value, hello is returned.
RANK() OVER([ partition_clause ] order_by_clause) Returns the rank of the current row with gaps, counting from 1. RANK() OVER (PARTITION BY BANK_ID ORDER BY BANK_NAME) returns the rank of each row within the partition group of BANK_ID, in ascending order of BANK_NAME.
ROW_NUMBER() OVER([ partition_clause ] order_by_clause) Returns the unique number of the current row within its partition, counting from 1. ROW_NUMBER() OVER (PARTITION BY BANK_ID ORDER BY BANK_NAME) returns the unique row number of each row within the partition group of BANK_ID, in ascending order of BANK_NAME.
String Functions
FunctionsDescriptionExample
CAST(value AS type)Returns the specified value in the specified type.CAST("10" AS INT) returns 10
CONCAT(string, string)Returns the combined values of strings or columns.CONCAT('Oracle','SQL') returns OracleSQL
CONCAT_WS(separator, expression1, expression2, expression3,...) Returns the combined values of strings or columns using the specified separator in between the strings or columns.

A separator is required and it must be a string.

At least one expression must be provided after the separator. For example: CONCAT_WS(',' col1)

CONCAT_WS('-', 'Hello', 'Oracle') returns Hello-Oracle

CONCAT_WS(' ', address, city, postal_code) returns 123 MyCity 987654

If a child of the function is an array, then the array is flattened:

CONCAT_WS(',', 1,2,3, to_array(4,5,6), to_array(7,8), 9) returns 1,2,3,4,5,6,7,8,9

INITCAP(string)Returns the string with the first letter in each word capitalized, while all other letters are in lowercase, and each word is delimited by a white space.INITCAP('oRACLE sql') returns Oracle Sql
INSTR(string, substring[start_position])Returns the (1-based) index of the first occurrence of substring in string.INSTR('OracleSQL', 'SQL') returns 7
LOWER(string)Returns the string with all letters changed to lowercase.LOWER('ORACLE') returns oracle
LENGTH(string)Returns the character length of string or the number of bytes of binary data. The length of the string includes trailing spaces.LENGTH('Oracle') returns 6
LTRIM(string)Returns the string with leading spaces removed from the left.LTRIM(' Oracle')
NVL(expr1, epxr2)Returns the argument that is not null.NVL(EXPRESSION_3.CUSTOMERS_JSON.CONTINENT_ID, ROWID())
REGEXP_SUBSTR(string, regexp[, RegexGroupIdx])Searches and extracts the string that matches a regular expression pattern from the input string. If the optional capturing group index is provided, the function extracts the specific group.

REGEXP_SUBSTR('https://www.oracle.com/products', 'https://([[:alnum:]]+\.?){3,4}/?') returns https://www.oracle.com

REGEXP_SUBSTR('22;33;44', '([0-9.]*);([0-9.]*);([0-9.]*)', 1) returns 22
REPLACE(string, search, replacement)Replaces all occurrences of search with replacement.

If search is not found in string, then string is returned unchanged.

If replacement is not specified or is an empty string, nothing replaces the search that is removed from string.

REPLACE('ABCabc', 'abc', 'DEF') returns ABCDEF
RTRIM(string)Returns the string with leading spaces removed from the right.RTRIM('Oracle ')
SUBSTRING(string, position[, substring_length])Returns the substring starting at position. SUBSTRING('Oracle SQL' FROM 2 FOR 3) returns rac
For numbers, TO_CHAR(expr) and for dates TO_CHAR(expr, format[, locale])Converts numbers and dates into strings. For numbers, no format is required. For dates, use the same format as DATE_FORMAT described in Date and Time Functions. Default locale is en-US. See supported language tags.

In pipeline expressions, the format_string must use the strftime format codes. Otherwise, the supported date format patterns are:

  • yy: two digit year
  • yyyy: four digit year
  • M: Numeric month, such as 1 for January
  • MM: Numeric month, such as 01 for January
  • MMM: Abbreviated month, such as Jan
  • MMMM: Full month, such as January
  • d: Numeric day of the month, such as 1 for June 1
  • dd: Numeric day of the month, such as 01 for June 1
  • DDD: Numeric day of the year from 001 to 366, such as 002 for January 2
  • F: Numeric day of the week in a month, such as 3 for 3rd Monday in June.
  • EEE or E: Abbreviated named day of the week, such as Sun for Sunday
  • EEEE: Named day of the week, such as Sunday
  • HH: 24 hour format from 00 to 23
  • H: 24 hour format from 0 to 23
  • hh: 12 hour format from 01 to 12
  • h: 12 hour format from 1 to 12
  • mm: minutes from 00 to 59
  • ss: seconds from 00 to 59
  • SSS: milliseconds from 000 to 999
  • a: AM or PM
  • z: time zone such as PDT

Number example: TO_CHAR(123) returns 123

Date example: TO_CHAR(Date '2020-10-30', 'yyyy.MM.dd', 'en-US') returns the string 2020.10.30. The first argument is a Date object representing Oct 30th, 2020.

UPPER(string)Returns a string with all letters changed to uppercase.UPPER('oracle') returns ORACLE
LPAD(str, len[, pad])Returns a string that is left-padded with specified characters to a certain length. If the pad character is omitted, the default is a space.LPAD('ABC', 5, '*') returns '**ABC'
RPAD(str, len[, pad])Returns a string that is right-padded with specified characters to a certain length. If the pad character is omitted, the default is a space.RPAD('XYZ', 6, '+' ) returns 'XYZ+++'
Operator (Comparison) Functions
FunctionDescriptionExample
CASE WHEN condition1 THEN result1 ELSE result2 ENDReturns the value for which a condition is met.CASE WHEN 1 > 0 THEN 'ABC' ELSE 'XYZ' END returns ABC if 1> 0, otherwise returns XYZ
ANDThe logical AND operator. Returns true if both operands are true, otherwise returns false.(x = 10 AND y = 20) returns "true" if x is equal to 10 and y is equal to 20. If either one is not true, then it returns "false"
ORThe logical OR operator. Returns true if either operand is true or both are true, otherwise returns false.(x = 10 OR y = 20) returns "false" if x is not equal to 10 and also y is not equal to 20. If either one is true, then it returns "true"
NOTThe logical NOT operator.
LIKEPerforms string pattern matching, whether string1 matches the pattern in string2.
=Tests for equality. Returns true if expr1 equals expr2, otherwise returns false.x = 10 returns "true" when value of x is 10, else it returns "false"
!=Tests for inequality. Returns true if expr1 does not equal to expr2, otherwise returns false.x != 10 returns "false" if value of x is 10, else it returns "true"
>Tests for an expression greater than. Returns true if expr1 is greater than expr2.x > 10 returns "true" if value of x is greater than 10, else it returns "false"
>=Tests for an expression greater than or equal to. Returns true if expr1 is greater than or equal to expr2.x > =10 returns "true" if value of x is greater than or equal to 10, else it returns "false"
<Tests for an expression less than. Returns true if expr1 is less than expr2.x < 10 returns "true" if value of x is less than 10, else it returns "false"
<=Tests for an expression less than or equal to. Returns true if expr1 is less than or equal to expr2.x <= 10 returns "true" if value of x is less than 10, else it returns "false"
||Concatenates two strings.'XYZ' || 'hello' returns 'XYZhello'
BETWEENEvaluates a range.FILTER_1.BANK.BANK_ID BETWEEN 1003 AND 1007
INTests whether an expression matches a list of values.FILTER_2.ORDERS.ORDER_ID IN (1003, 1007)
Unique ID Functions
FunctionDescriptionExample
NUMERIC_ID()Generates a universally unique identifier that is a 64-bit number for each row.NUMERIC_ID() returns for example, 3458761969522180096 and 3458762008176885761
ROWID()Generates monotonically increasing 64-bit numbers.ROWID() returns for example, 0, 1, 2, and so on
UUID()Generates a universally unique identifier that is a 128-bit String for each row.UUID() returns for example, 20d45c2f-0d56-4356-8910-162f4f40fb6d
MONOTONICALLY_INCREASING_ID() Generates unique, monotonically increasing 64-bit integers that are not consecutive numbers. MONOTONICALLY_INCREASING_ID() returns for example, 8589934592 and 25769803776
Conditional Functions
FunctionDescriptionExample
COALESCE(value, value [, value]*)Returns the first non-null argument, if it exists, otherwise returns null.COALESCE(NULL, 1, NULL) returns 1
NULLIF(value, value)Returns null if the two values equal each other, otherwise returns the first value.NULLIF('ABC','XYZ') returns ABC
Hierarchical Functions
FunctionDescriptionExample
SCHEMA_OF_JSON(string)Parses a JSON string and infers the schema in DDL format.

SCHEMA_OF_JSON('[{\"Zipcode\":704,\"ZipCodeType\":\"STANDARD\",\"City\":\"ORACLECITY\",\"State\":\"OC\"}]') returns 'ARRAY<STRUCT<City:string,State:string,ZipCodeType:string,Zipcode:bigint>>'

SCHEMA_OF_JSON('[{\"col\":0}]') returns 'ARRAY<STRUCT<col: BIGINT>>'

FROM_JSON(column, string)

Parses a column containing a JSON string into one of the following types, with the specified schema.

  • Map, with String as the key type
  • Struct
  • Array

FROM_JSON('{\"Zipcode\":704,\"City\":\"ORACLE CITY\"}', 'STRUCT<Zipcode: BIGINT, City: STRING>') returns a Struct type column with the specified schema: {704, ORACLE CITY}

FROM_JSON('{\"a\":1, \"b\":0.8}', 'STRUCT<a: BIGINT, b: DOUBLE>') returns a Struct type column with the specified schema: {1, 0.8}

TO_JSON(column)Converts a column containing a type of Struct or Array of Structs, or a Map or Array of Map, into a JSON string.TO_JSON(TO_STRUCT('s1', TO_ARRAY(1,2,3), 's2', TO_MAP('key', 'value'))) returns a JSON string {"s1":[1,2,3],"s2":{"key":"value"}}
TO_MAP(string,column[,string,column]*)Creates a new column of Map type. The input columns must be grouped as key-value pairs. The input key columns cannot be null, and must all have the same data type. The input value columns must all have the same data type.

TO_MAP('Ename',Expression_1.attribute1) returns a Map type column: {"ENAME" -> 100}

TO_MAP('block', EXPRESSION_1.MYSOURCE.address.block, 'unit', EXPRESSION_1.MYSOURCE.address.unit) returns a Map type column: {"block" -> 1,"unit" -> 1}

TO_STRUCT(string,column[,string,column]*)Creates a new column of Struct type. The input columns must be grouped as key-value pairs.

TO_STRUCT('Ename',Expression_1.attribute1) returns {100}

TO_STRUCT('Id',Expression_1.attribute1, 'Name', Expression_1.attribute2) returns {100, "John"}

TO_ARRAY(column[,column]*)Creates a new column as Array type. The input columns must all have the same data type.

TO_Array(Expression_1.attribute1) returns [100]

TO_ARRAY(EXPRESSION_1.attribute2,EXPRESSION_1.attribute3) returns ["John","Friend"]

Higher-Order Functions

Data flow operators that support creating expressions and hierarchical data types can use higher-order functions.

The supported operators are:

  • Aggregate

  • Expression

  • Filter

  • Join

  • Lookup

  • Split

  • Pivot

FunctionDescriptionExample
TRANSFORM(column, lambda_function)Takes an array and an anonymous function, and sets up a new array by applying the function to each element, and assigning the result to the output array.For an input array of integers [1, 2, 3], TRANSFORM(array, x -> x + 1) returns a new array of [2, 3, 4].
TRANSFORM_KEYS(column, lambda_function)Takes a map and a function with 2 arguments (key and value), and returns a map in which the keys have the type of the result of the lambda function, and the values have the type of the column map values.For an input map with integer keys and string values of {1 -> 'value1', 2 -> 'value2', 3 -> 'value3'}, TRANSFORM_KEYS(map, (k, v) -> k * 2 + 1) returns a new map of {3 -> 'value1', 5 -> 'value2', 7 -> 'value3'}.
TRANSFORM_VALUES(column, lambda_function)Takes a map and a function with 2 arguments (key and value), and returns a map in which the values have the type of the result of the lambda functions, and the keys have the type of the column map keys. For an input map with string keys and string values of {'a' -> 'value1', 'b' -> 'value2', 'c' -> 'value3'}, TRANSFORM_VALUES(map, (k, v) -> k || __ '' || v) returns a new map of {'a' -> 'a_value1', 'b' -> 'b_value2', 'c' -> 'c_value3'}.
ARRAY_SORT(array(...), lambda_function)

Only the Expression operator supports ARRAY_SORT.

Takes an array and sorts according to the given function that takes 2 arguments.

The function must return -1, 0, or 1 depending on whether the first element is less than, equal to, or greater than the second element.

If the function is omitted, the array is sorted in ascending order.

array_sort(to_array(5, 6, 1),
                   (left, right) -> CASE WHEN left < right THEN -1
                                         WHEN left > right THEN 1 ELSE 0 END)

The returned array is:

[1,5,6]
Duplicating an expression

You can duplicate expressions that have been added to an expression operator.

  1. On the data flow canvas, select the expression operator.
  2. Under the Details tab of the Properties panel, review the list of expressions.
  3. Select Duplicate from the Actions menu (Actions menu) corresponding to the expression you want to duplicate.
    No confirmation dialog displays; the duplicated expression is added to the list immediately.
Moving an expression

You can change the order of expressions that have been added to an expression operator.

  1. On the data flow canvas, select the expression operator.
  2. Under the Details tab of the Properties panel, review the list of expressions.
  3. For the expression you want to move, select Move up, Move down, Move to top, or Move to bottom from the Actions menu (Actions menu).
    Only relevant move actions are available, depending on the current position of the expression. For example, Move up and Move to top are unavailable for the first expression in the list.

Aggregate Operator

Use the aggregate operator to perform calculations such as sum or count, on all rows or a group of rows to create new, derivative attributes.

Adding and configuring an aggregate operator
  1. From the Operators panel, drag an Aggregate onto the canvas.
  2. With the aggregate operator in focus, on the Details tab of the Properties panel, enter a name in the Identifier field, or leave the name as-is.
  3. Under Group by, select one of the following:
    • Attributes: Use the menu to select one or more attributes to group by.
    • Pattern: Click Add pattern to add a pattern that selects the attributes to group by. You use a regular expression pattern to match two or more attributes.
    Note

    The expressions you add are performed on each selected attribute. The attributes that continue downstream include the attributes you group by plus the attributes derived from the expressions you add.
  4. Under Expressions, click Add expression.
  5. In the Add Expression panel, enter a name for the aggregate expression in the Identifier field, or leave the name as-is.
  6. (Optional) To select the source attributes, where you want to apply the expression operation, using regular expression patterns, select Allow bulk selection and then select Pattern. For example, consider that you have three order attributes in the dataset and you want to apply the SUM operator function to all three of them. You can use a pattern to select this group of source attributes.
    1. For Source attributes, enter the regular expression that matches a group of source attributes in Pattern. For example, enter *ORDER to match attributes FIRST_ORDER, LAST_ORDER, and FINAL_ORDER.
    2. Then for Data type, select a value from the menu and complete the fields corresponding to the type selected.
    3. For Target attributes, enter the regular expression to define the target attribute names in Pattern. By default, $0 is entered to refer to the original source attributes. Update the pattern to name the target attribute according to requirements. For example, enter $0_SUM to append text SUM to the original source attribute names.
    4. By default, Use source attribute data types is selected. You can retain the selection. If you clear the Use source attribute data types checkbox, then select a Data type and complete the fields corresponding to the type selected.
  7. (Optional) If you did not select Allow bulk selection, under Expression data type, you can select Infer data type if you want the Expression Builder to detect the data type from the expression that you enter. You can click Preview data type and Validate to preview and refresh the data type, and validate the expression.

    Letting the builder infer data type is useful when you're working with complex data types such as Map, Array, and Struct that might have levels of nested types.

  8. If you did not select Allow bulk selection, then for Data type, select a value from the menu, and then complete the corresponding fields according to the type selected.
  9. (Optional) Under Expression data type, you can select Infer data type if you want the Expression Builder to detect the data type from the expression that you enter. You can click Preview data type and Validate to preview and refresh the data type, and validate the expression.

    Letting the builder infer data type is useful when you're working with complex data types such as Map, Array, and Struct that might have levels of nested types.

  10. If you did not select Allow bulk selection and Infer data type, select a value from the Data type menu, and then complete the fields corresponding to the type selected.
  11. In the Expression builder section, visually construct an expression by double-clicking or dragging incoming attributes, parameters, or functions to add to the editor to build the expression. Or you can manually write the expression yourself. You can also validate the expression.

    If you selected Allow bulk selection, then use %MACRO_INPUT% in the expression to indicate the attributes on which the function is to be applied. For example, consider you used the pattern *NAME to match source attributes FIRST_NAME, LAST_NAME, and FULL_NAME. Now you can specify a function as UPPER(%MACRO_INPUT%) to apply the function on all the attributes that match the pattern.

    To exclude one or more incoming attributes from the output of this expression, select the Exclude incoming attributes checkbox. Then use the menu to add the incoming attributes that you want to exclude from the output. An Exclude rule is applied to each attribute that you select to exclude. You can exclude attributes only when you're adding the expression the first time. When you edit the expression, the Exclude incoming attributes checkbox is unavailable.

    Note

    To replace a placeholder in a function with another element, highlight the placeholder and then double-click another element from the list to add it to the function.
  12. Click Add.
    You can repeat the steps from step 4 to add more expressions, if needed.
  13. Under the Attributes tab, select to view incoming attributes or output attributes. Incoming attributes link to the operator on the left side. Output attributes link to the operator on the right side and go into the next operator.

    You can select individual attributes or use the filter icons to filter the list of attributes. Then apply bulk exclude rules to the selected or filtered attributes.

    • Click the filter icon in the Name column to filter attributes by a name pattern. In the filter field, enter a simple regular expression. You can use wildcards ? and * in the regex pattern.
    • Click the filter icon in the Type column to filter attributes by a data type. Use the menu to select the type to use as a filter.
      Note

      You can apply only one name pattern filter, but multiple type filters at a time. For example, to filter by the name pattern *_CODE and by the type numeric or varchar, you apply one name pattern filter (*_CODE) and two type filters (numeric, varchar).
    • Use the Actions menu to apply bulk exclude rules to the selected attributes or filtered attributes. Select Exclude by selection or Exclude by applied filter.
    • Click View rules to open the Rules panel. You can view and manage the rules that have been applied to the data entity. By default, the first rule in the Rules panel includes everything.

    A complex type is shown as ARRAY (data type), COMPOSITE, or MAP (key type, value type). See Hierarchical Data Types to understand what is supported for complex types.

  14. In the Data tab, view a sampling of the data, based on the configuration of the operator and any rules you have applied in the Attributes tab. You can apply transformations to individual attributes or do a bulk transformation on a group of attributes.
  15. In the Validation tab, check for warnings or errors that might cause the data flow to fail.
Adding an aggregate expression

Use the Expression Builder to visually select elements to build an expression in the editor. You can also manually write an expression yourself.

The Add Expression panel has two sections: Expression information and Expression builder. The Expression information fields let you specify a name and data type for the expression. You can also create the expression to apply to two or more attributes. When working with complex data types such as Map, Array, and Struct that might have levels of nested types, you can choose to let the builder detect the data type from the expression that you enter. When you let the builder infer the data type, you can preview and refresh the data type, and validate the expression.

The Expression builder section lists the elements for building an expression. The elements that you can use in an expression include incoming attributes, parameters, and functions. Double-click or drag an element from the list to add to the editor to build the expression, or manually write the expression yourself. You can validate the expression before creating it.

Incoming displays the attributes from the upstream operator that are coming into this expression operator. Below the attributes list is a checkbox that lets you apply Exclude rules. To exclude one or more incoming attributes from the output of this expression, select the Exclude incoming attributes checkbox. Then use the menu to add the incoming attributes that you want to exclude from the output. An Exclude rule is applied to each attribute that you select to exclude. You can exclude attributes only when you're adding the expression the first time. When you edit the expression, the Exclude incoming attributes checkbox is unavailable.

Parameters are the expression parameters that have been added to the data flow using the Condition Builder (filter, join, lookup, and split operators) or Expression Builder (expression and aggregate operators). An expression parameter has a name, type, and default value. See Adding an Expression Parameter.

Functions are operations performed on arguments passed to the function. Functions calculate, manipulate, or extract data values from arguments. Here's a list of functions available to use to build the expressions:

Aggregate Functions
FunctionDescriptionExample
COUNT(value[, value]*)Returns the number of rows for which one or more supplied expressions are all non-null.COUNT(expr1)
COUNT(*)Returns the total number of retrieved rows, including rows containing null.COUNT(*)
MAX(value)Returns the maximum value of the argument.MAX(expr)
MIN(value)Returns the minimum value of the argument.MIN(expr)
SUM(numeric)Returns the sum calculated from values of a group.SUM(expr1)
AVG(numeric)Returns the average of numeric values in an expression.AVG(AGGREGATE_1.src1.attribute1)
LISTAGG(column[, delimiter]) WITHIN GROUP (order_by_clause)

Concatenates values of the input column with the specified delimiter, for each group based on the order clause.

column contains the values you want to concatenate together in the result.

The delimiter separates the column values in the result. If a delimiter is not provided, then an empty character is used.

order_by_clause determines the order that the concatenated values are returned.

This function can only be used as an aggregator, and can be used with grouping or without grouping. If you use without grouping, the result is a single row. If you use with a grouping, the function returns a row for each group.

Consider a table with two columns, id, name. The table has three rows. The id column values are 101, 102, 102. The name column values are A, B, C.

+-----+--------+
| id  | name  |
+-----+--------+
| 101 | A     |
+-----+--------+
| 102 | B     |
+-----+--------+
| 102 | C     |
+-----+--------+

Example 1: Without grouping

LISTAGG(id, '-') WITHIN GROUP (ORDER BY id) returns the name column with the value A-B-C

+--------+
| name   |
+--------+
| A-B-C  |
+--------+

Example 2: Group by the id

LISTAGG(id, '-') WITHIN GROUP (ORDER BY id) returns the name column with the values A and B-C in two groups.

+--------+
| name   |
+--------+
| A      |
+--------+
| B-C    |
+--------+
Higher-Order Functions

Data flow operators that support creating expressions and hierarchical data types can use higher-order functions.

The supported operators are:

  • Aggregate

  • Expression

  • Filter

  • Join

  • Lookup

  • Split

  • Pivot

FunctionDescriptionExample
TRANSFORM(column, lambda_function)Takes an array and an anonymous function, and sets up a new array by applying the function to each element, and assigning the result to the output array.For an input array of integers [1, 2, 3], TRANSFORM(array, x -> x + 1) returns a new array of [2, 3, 4].
TRANSFORM_KEYS(column, lambda_function)Takes a map and a function with 2 arguments (key and value), and returns a map in which the keys have the type of the result of the lambda function, and the values have the type of the column map values.For an input map with integer keys and string values of {1 -> 'value1', 2 -> 'value2', 3 -> 'value3'}, TRANSFORM_KEYS(map, (k, v) -> k * 2 + 1) returns a new map of {3 -> 'value1', 5 -> 'value2', 7 -> 'value3'}.
TRANSFORM_VALUES(column, lambda_function)Takes a map and a function with 2 arguments (key and value), and returns a map in which the values have the type of the result of the lambda functions, and the keys have the type of the column map keys. For an input map with string keys and string values of {'a' -> 'value1', 'b' -> 'value2', 'c' -> 'value3'}, TRANSFORM_VALUES(map, (k, v) -> k || __ '' || v) returns a new map of {'a' -> 'a_value1', 'b' -> 'b_value2', 'c' -> 'c_value3'}.
ARRAY_SORT(array(...), lambda_function)

Only the Expression operator supports ARRAY_SORT.

Takes an array and sorts according to the given function that takes 2 arguments.

The function must return -1, 0, or 1 depending on whether the first element is less than, equal to, or greater than the second element.

If the function is omitted, the array is sorted in ascending order.

array_sort(to_array(5, 6, 1),
                   (left, right) -> CASE WHEN left < right THEN -1
                                         WHEN left > right THEN 1 ELSE 0 END)

The returned array is:

[1,5,6]

Distinct Operator

Use the distinct operator to return distinct rows with unique values.

Adding and configuring a distinct operator
  1. From the Operators panel, drag an Distinct operator onto the canvas.
  2. With the distinct operator in focus, on the Details tab of the Properties panel, enter a name in the Identifier field, or leave the name as-is.

    The Details tab configurations are set by default for the distinct operator.

  3. To view the output of the operator, click the Data tab.
    Distinct rows with unique values are displayed. You can also apply transforms to the data, or select an attribute to view a data profile or an attribute profile. Learn more about Data Transformations.
  4. Under the Attributes tab, select to view incoming attributes or output attributes. Incoming attributes link to the operator on the left side. Output attributes link to the operator on the right side and go into the next operator.

    You can select individual attributes or use the filter icons to filter the list of attributes. Then apply bulk exclude rules to the selected or filtered attributes. The distinct operator operates only on the filtered attributes, and outputs distinct rows with unique values.

    • Click the filter icon in the Name column to filter attributes by a name pattern. In the filter field, enter a simple regular expression. You can use wildcards ? and * in the regex pattern.
    • Click the filter icon in the Type column to filter attributes by a data type. Use the menu to select the type to use as a filter.
      Note

      You can apply only one name pattern filter, but multiple type filters at a time. For example, if you apply one name pattern filter (*_CODE) and two type filters (numeric, varchar), the attributes are filtered by the name pattern *_CODE and the type numeric or varchar.
    • Use the Actions menu to apply bulk exclude rules to the selected attributes or filtered attributes. Select Exclude by selection or Exclude by applied filter.
    • Click View rules to open the Rules panel. You can view and manage the rules that have been applied to the data entity. By default, the first rule in the Rules panel includes everything.

    A complex type is shown as ARRAY (data type), COMPOSITE, or MAP (key type, value type). See Hierarchical Data Types to understand what is supported for complex types.

  5. In the Data tab, view a sampling of the data, based on any rules you have applied in the Attributes tab.

    The Data Profile does not appear for complex data attributes. For complex data attributes, to view the hierarchy of the data structure in a simplified structure, click the complex data shown. For example, [...] or {…}.

    You can apply transformations to individual attributes or do a bulk transformation on a group of attributes.

    For entities with hierarchical data types, see Hierarchical Data Types to understand what is supported.

  6. In the Validation tab, check for warnings or errors that might cause the data flow to fail.

Sort Operator

Use the sort operator to perform sorting of data in ascending or descending order.

When using a sort operator, we recommend that you apply the sort operator after you apply other operators. This ensures that the sort operator stays immediately before the target operator, therefore enabling data to be inserted into the target in a specific order.

Adding and configuring a sort operator

When using a sort operator, apply the sort operator after other shaping operators, and before the operator that requires data to be sorted.

For example, apply the sort operator before a target operator to insert data in the target in a specific sort order.

  1. From the Operators panel, drag a Sort operator onto the canvas.
  2. With the sort operator in focus, on the Details tab of the Properties panel, enter a name in the Identifier field, or leave the name as-is.
  3. In the Sort conditions section, click Add.

    The Add Sort Condition panel displays all the attribute fields fetched from the source table.

  4. In the Add sort condition panel, select the attribute to sort the data by.
  5. To filter attributes using name patterns, enter a name pattern, for example, *CITY*.
  6. For Sort order, select Ascending or Descending, and then click Add.

    Each condition you add is displayed in the sort condition list.

    Note

    You can add multiple sort conditions one-by-one. Sort operates based on the order of sort conditions in the list. For example, the sorting begins based on the first condition in the list. Then the sorted data is resorted based on the second condition, and so on.

    For string data types, the sort operation occurs based on the lexicographic order.

  7. Under the Attributes tab, select to view incoming attributes or output attributes. Incoming attributes link to the operator on the left side. Output attributes link to the operator on the right side and go into the next operator.

    You can select individual attributes or use the filter icons to filter the list of attributes. Then apply bulk exclude rules to the selected or filtered attributes.

    • Click the filter icon in the Name column to filter attributes by a name pattern. In the filter field, enter a simple regular expression. You can use wildcards ? and * in the regex pattern.
    • Click the filter icon in the Type column to filter attributes by a data type. Use the menu to select the type to use as a filter.
      Note

      You can apply only one name pattern filter, but multiple type filters at a time. For example, to filter by the name pattern *_CODE and by the type numeric or varchar, you apply one name pattern filter (*_CODE) and two type filters (numeric, varchar).
    • Use the Actions menu to apply bulk exclude rules to the selected attributes or filtered attributes. Select Exclude by selection or Exclude by applied filter.
    • Click View rules to open the Rules panel. You can view and manage the rules that have been applied to the data entity. By default, the first rule in the Rules panel includes everything.

    A complex type is shown as ARRAY (data type), COMPOSITE, or MAP (key type, value type). See Hierarchical Data Types to understand what is supported for complex types.

  8. In the Data tab, view a sampling of the data, based on the configuration of the operator and any rules you have applied in the Attributes tab.

    The Data Profile does not appear for complex data attributes. For complex data attributes, to view the hierarchy of the data structure in a simplified structure, click the complex data shown. For example, [...] or {…}.

    You can apply transformations to individual attributes or do a bulk transformation on a group of attributes.

    For entities with hierarchical data types, see Hierarchical Data Types to understand what is supported.

  9. In the Validation tab, check for warnings or errors that might cause the data flow to fail.
Adding sort conditions

After you add a sort operator on the canvas, and connect it with another operator, you can add a sort condition.

From the Add Sort Condition panel, you can select attributes to sort from the fields listed, or filter attribute names using patterns. For string data types, the sort operation occurs based on the lexicographic order.

To add sort conditions:

  1. Under the Details tab, go to the Sort conditions section, and click Add.

    The Add Sort Condition panel displays all the attribute fields fetched from the source table.

  2. In the Add sort condition panel, select the attribute to sort the data by.
  3. To filter attributes using name patterns, enter a name pattern, for example, *CITY*.
  4. For Sort order, select Ascending or Descending, and then click Add.

    Each condition you added is displayed in the sort condition list.

    Note

    You can add multiple sort conditions one-by-one. Sort operates based on the order of sort conditions in the list. For example, the sorting happens first based on the first condition in the list and then the sorted data is re-sorted based on the second condition and so on.
Prioritizing sort conditions

Move the sort conditions up or down in the list to prioritize the sort.

Moving sort conditions up or down lets you sort first by a high priority sort condition, and then re-sort the sorted data by the next condition in the list. For example, to sort first by address and then by the postal code, move the sort condition with address to the top.

To prioritize sort conditions:

  1. Under the Details tab, go to the Sort conditions section.
  2. Click the Actions menu (Actions menu) corresponding to the sort condition you want to move up or down.
  3. To make a sort condition to be applied first, select Move up.
  4. To make a sort condition to be applied later, select Move down.
    Sort is applied to the moved-up condition first, and then the moved-down condition.
Editing sort conditions

To edit sort conditions:

  1. Under the Details tab, go to the Sort conditions section.
  2. Click the Actions menu (Actions menu) corresponding to the sort condition you want to change, and then select Edit.
  3. In the Edit sort condition panel, select the new attribute, or filter the attributes by entering a name pattern.
    The already selected attributes or patterns including their associated attributes are unavailable for further selection.
Deleting sort conditions

You can delete sort conditions one by one or you can do a bulk delete.

To delete sort conditions:

  1. Under the Details tab, go to the Sort conditions section.
  2. To delete sort conditions one by one, click the Actions menu (Actions menu) corresponding to the sort condition you want to delete.
  3. To delete multiple sort conditions, select the checkboxes corresponding to each, and then click Delete at the top of the list.

Union Operator

Use the union operator to perform a union operation between two or more datasets.

Understanding union types

You can perform a union operation on up to ten source operators. You must configure at least two source inputs. You can choose to perform the union operation by matching attribute names across the source input attributes, or you can match the attributes by their attribute position.

Union by attribute name

Consider the following two data entity examples. Data entity 1 is set as the primary input. The resulting data entity shows how the data from the two inputs are combined during a union by name operation. The resulting data entity uses the attribute name, order, and data type from the primary input data entity.

Data Entity 1, primary input
Department LOCATION-ID Warehouse
IT 1400 San Francisco
Shipping 1500 Southlake, Texas
Accounting 1700 New Jersey
Data Entity 2
Warehouse LOCATION-ID Department
Denver 1600 Benefits
New York 1400 Construction
Resulting Data Entity
Department LOCATION-ID Warehouse
IT 1400 San Francisco
Shipping 1500 Southlake, Texas
Accounting 1700 New Jersey
Benefits 1600 Denver
Construction 1400 New York
Union by attribute position

Consider the following two data entity examples. Data entity 2 is set as the primary input. The resulting data entity shows how the data from the two inputs are combined during a union by position operation. The resulting data entity uses the attribute name, order, and data type from the primary input data entity.

Data Entity 1
Department LOCATION-ID Warehouse
IT 1400 San Francisco
Shipping 1500 Southlake, Texas
Accounting 1700 New Jersey
Data Entity 2, primary input
Warehouse LOCATION-ID Department
Denver 1600 Benefits
New York 1400 Construction
Resulting Data Entity
Warehouse LOCATION-ID Department
Denver 1600 Benefits
New York 1400 Construction
IT 1400 San Francisco
Shipping 1500 Southlake, Texas
Accounting 1700 New Jersey
Adding and configuring a union operator
  1. To add a union operator to the data flow, drag the Union operator from the Operators panel onto the canvas. You can perform the union operation on up to ten data sources.
    The Details tab for the union operator displays in the Properties panel.
  2. On the Details tab, enter a name for this union operation in the Identifier field, or leave the name as-is.
  3. Specify how you want the union operation to be performed using the Match input attribute by field. You can choose from the following two options:
    • Name: The attribute names from the sources are matched to perform the union operation. The attribute names and data types, and the number of attributes across the inputs must be identical. The match is case-sensitive. Review the examples to understand how the union by name operation works.
    • Position: The attributes from the sources are matched based on their positions. The number of attributes and the attribute data types across the inputs must be identical. Review the examples to understand how the union by position operation works.
  4. For Union all, select the checkbox if you want the operation to return all rows from all the connected data entities without eliminating any duplicates.
  5. For Primary input, select the source that should be considered as the key source in the union operation. This primary input determines the attribute names, attribute data types, and attribute order for the resulting attributes from the union operation. By default, the first source added to the data flow is set as the primary input.
  6. Under the Attributes tab, select to view incoming attributes or output attributes. Incoming attributes link to the operator on the left side. Output attributes link to the operator on the right side and go into the next operator.

    You can select individual attributes or use the filter icons to filter the list of attributes. Then apply bulk exclude rules to the selected or filtered attributes.

    • Click the filter icon in the Name column to filter attributes by a name pattern. In the filter field, enter a simple regular expression. You can use wildcards ? and * in the regex pattern.
    • Click the filter icon in the Type column to filter attributes by a data type. Use the menu to select the type to use as a filter.
      Note

      You can apply only one name pattern filter, but multiple type filters at a time. For example, if you apply one name pattern filter (*_CODE) and two type filters (numeric, varchar), the attributes are filtered by the name pattern *_CODE and the type numeric or varchar.
    • Use the Actions menu to apply bulk exclude rules to the selected attributes or filtered attributes. Select Exclude by selection or Exclude by applied filter.
    • Click View rules to open the Rules panel. You can view and manage the rules that have been applied to the data entity. By default, the first rule in the Rules panel includes everything.

    A complex type is shown as ARRAY (data type), COMPOSITE, or MAP (key type, value type). See Hierarchical Data Types to understand what is supported for complex types.

  7. In the Data tab, view a sampling of the data, based on the configuration of the operator and any rules you have applied in the Attributes tab.

    The Data Profile does not appear for complex data attributes. For complex data attributes, to view the hierarchy of the data structure in a simplified structure, click the complex data shown. For example, [...] or {…}.

    You can apply transformations to individual attributes or do a bulk transformation on a group of attributes.

    For entities with hierarchical data types, see Hierarchical Data Types to understand what is supported.

  8. In the Validation tab, check for warnings or errors that might cause the data flow to fail.

Minus Operator

Use the minus operator to compare two data entities and return the rows that are present in one entity but not present in the other entity.

You can choose to keep or eliminate duplicate rows in the resulting data.

Understanding minus types

You can perform a minus operation on two source operators only. You can choose to perform the minus operation by matching attribute names across the source input attributes, or you can match the attributes by their attribute position.

Minus by attribute name

Consider the following two data entity examples. Data entity 1 is set as the primary input. The resulting data entity shows how the data from the two inputs are subtracted during a minus by name operation. The resulting data entity uses the attribute name, order, and data type from the primary input data entity.

Data Entity 1, primary input
DepartmentLOCATION-IDWarehouse
IT1400San Francisco
Shipping1500Austin
Accounting1700New Jersey
Data Entity 2
DepartmentWarehouseLOCATION-ID
BenefitsDenver1600
ITSan Francisco1400
Resulting Data Entity
DepartmentLOCATION-IDWarehouse
Shipping1500Austin
Accounting1700New Jersey
Minus by attribute position

Consider the following two data entity examples. Data entity 2 is set as the primary input. The resulting data entity shows how the data from the two inputs are subtracted during a minus by position operation. The resulting data entity uses the attribute name, order, and data type from the primary input data entity.

Data Entity 1
DepartmentLOCATION-IDWarehouse
IT1400San Francisco
Shipping1500Austin
Accounting1700New Jersey
Data Entity 2, primary input
Department-NameLocationWarehouse-City
Benefits1600Denver
IT1400San Francisco
Resulting Data Entity
Department-NameLocationWarehouse-City
Benefits1600Denver
Adding and configuring a minus operator
  1. To add a minus operator to the data flow, drag the Minus operator from the Operators panel onto the canvas.
    The Details tab for the minus operator displays in the Properties panel.
  2. On the Details tab, enter a name for this minus operation in the Identifier field, or leave the name as-is.
  3. Specify how you want the minus operation to be performed using the Match input attributes by field. You can choose from the following two options:
    • Name: The attribute names from the sources are matched to perform the minus operation. The attribute names and data types, and the number of attributes across the inputs must be identical. The match is case-sensitive. Review the examples to understand how the minus by name operation works.
    • Position: The attributes from the sources are matched based on their positions. The number of attributes and the attribute data types across the inputs must be identical. Review the examples to understand how the minus by position operation works.
  4. For Minus all, select the checkbox if you want the operation to return all rows found in one data entity that are not present in the other data entity, without eliminating any duplicates.
  5. For Primary input, select the source to be used as the primary source for the minus operation. The primary input determines the attribute names, attribute data types, and attribute order in the resulting output. By default, the first source that is connected to the minus operator is set as the primary input.
    Icon Primary Input Description
    left outer join venn diagram Left minus Selects all the remaining distinct rows from inbound source 1 after eliminating rows that match inbound source 2.
    right outer join venn diagram Right minus Selects all the remaining distinct rows from inbound source 2 after eliminating rows that match inbound source 1.
  6. Under the Attributes tab, select to view incoming attributes or output attributes. Incoming attributes link to the operator on the left side. Output attributes link to the operator on the right side and go into the next operator.

    You can select individual attributes or use the filter icons to filter the list of attributes. Then apply bulk exclude rules to the selected or filtered attributes.

    • Click the filter icon in the Name column to filter attributes by a name pattern. In the filter field, enter a simple regular expression. You can use wildcards ? and * in the regex pattern.
    • Click the filter icon in the Type column to filter attributes by a data type. Use the menu to select the type to use as a filter.
      Note

      You can apply only one name pattern filter, but multiple type filters at a time. For example, if you apply one name pattern filter (*_CODE) and two type filters (numeric, varchar), the attributes are filtered by the name pattern *_CODE and the type numeric or varchar.
    • Use the Actions menu to apply bulk exclude rules to the selected attributes or filtered attributes. Select Exclude by selection or Exclude by applied filter.
    • Click View rules to open the Rules panel. You can view and manage the rules that have been applied to the data entity. By default, the first rule in the Rules panel includes everything.

    A complex type is shown as ARRAY (data type), COMPOSITE, or MAP (key type, value type). See Hierarchical Data Types to understand what is supported for complex types.

  7. In the Data tab, view a sampling of the data, based on the configuration of the operator and any rules you have applied in the Attributes tab.

    The Data Profile does not appear for complex data attributes. For complex data attributes, to view the hierarchy of the data structure in a simplified structure, click the complex data shown. For example, [...] or {…}.

    You can apply transformations to individual attributes or do a bulk transformation on a group of attributes.

    For entities with hierarchical data types, see Hierarchical Data Types to understand what is supported.

  8. In the Validation tab, check for warnings or errors that might cause the data flow to fail.

Intersect Operator

Use the intersect operator to compare two or more data entities and return the rows that are present in the connected entities.

You can choose to keep or eliminate duplicate rows in the resulting data.

Understanding intersect types

You can perform an intersect operation on two or more source operators. You can choose to perform the operation by matching attribute names across the source input attributes, or you can match the attributes by their attribute position.

Intersect by attribute name

Consider the following two data entity examples. Data entity 1 is set as the primary input. The resulting data entity shows how the data from the two inputs are intersected by attribute name. The resulting data entity uses the attribute name, order, and data type from the primary input data entity.

Data Entity 1, primary input
DepartmentLOCATION-IDWarehouse
IT1400San Francisco
Shipping1500Austin
Accounting1700New Jersey
Data Entity 2
DepartmentWarehouseLOCATION-ID
BenefitsDenver1600
ITSan Francisco1400
Resulting Data Entity
DepartmentLOCATION-IDWarehouse
IT1400San Francisco
Intersect by attribute position

Consider the following two data entity examples. Data entity 2 is set as the primary input. The resulting data entity shows how the data from the two inputs are intersected by attribute position. The resulting data entity uses the attribute name, order, and data type from the primary input data entity.

Data Entity 1
DepartmentLOCATION-IDWarehouse
IT1400San Francisco
Shipping1500Austin
Accounting1700New Jersey
Data Entity 2, primary input
Department-NameLocationWarehouse-City
Benefits1600Denver
IT1400San Francisco
Resulting Data Entity
Department-NameLocationWarehouse-City
IT1400San Francisco
Adding and configuring an intersect operator
  1. To add an intersect operator to the data flow, drag the Intersect operator from the Operators panel onto the canvas.
    The Details tab for the intersect operator displays in the Properties panel.
  2. On the Details tab, enter a name for this intersect operation in the Identifier field, or leave the name as-is.
  3. Specify how you want the intersect operation to be performed using the Match input attributes by field. You can choose from the following two options:
    • Name: The attribute names from the sources are matched to perform the intersect operation. The attribute names and data types, and the number of attributes across the inputs must be identical. The match is case-sensitive. Review the examples to understand how the intersect by name operation works.
    • Position: The attributes from the sources are matched based on their positions. The number of attributes and the attribute data types across the inputs must be identical. Review the examples to understand how the intersect by position operation works.
  4. For Intersect all, select the checkbox if you want the operation to return all rows in the resulting output, without eliminating any duplicates.
  5. For Primary input, select the source to be used as the primary source for the intersect operation. The primary input determines the attribute names, attribute data types, and attribute order in the resulting output. By default, the first source that's connected to the intersect operator is set as the primary input.
  6. Under the Attributes tab, select to view incoming attributes or output attributes. Incoming attributes link to the operator on the left side. Output attributes link to the operator on the right side and go into the next operator.

    You can select individual attributes or use the filter icons to filter the list of attributes. Then apply bulk exclude rules to the selected or filtered attributes.

    • Click the filter icon in the Name column to filter attributes by a name pattern. In the filter field, enter a simple regular expression. You can use wildcards ? and * in the regex pattern.
    • Click the filter icon in the Type column to filter attributes by a data type. Use the menu to select the type to use as a filter.
      Note

      You can apply only one name pattern filter, but multiple type filters at a time. For example, if you apply one name pattern filter (*_CODE) and two type filters (numeric, varchar), the attributes are filtered by the name pattern *_CODE and the type numeric or varchar.
    • Use the Actions menu to apply bulk exclude rules to the selected attributes or filtered attributes. Select Exclude by selection or Exclude by applied filter.
    • Click View rules to open the Rules panel. You can view and manage the rules that have been applied to the data entity. By default, the first rule in the Rules panel includes everything.

    A complex type is shown as ARRAY (data type), COMPOSITE, or MAP (key type, value type). See Hierarchical Data Types to understand what is supported for complex types.

  7. In the Data tab, view a sampling of the data, based on the configuration of the operator and any rules you have applied in the Attributes tab.

    The Data Profile does not appear for complex data attributes. For complex data attributes, to view the hierarchy of the data structure in a simplified structure, click the complex data shown. For example, [...] or {…}.

    You can apply transformations to individual attributes or do a bulk transformation on a group of attributes.

    For entities with hierarchical data types, see Hierarchical Data Types to understand what is supported.

  8. In the Validation tab, check for warnings or errors that might cause the data flow to fail.

Split Operator

Use the split operator to divide one source of input data into two or more output ports based on split conditions that are evaluated in a sequence.

Each split condition has an output port. Data that satisfies a condition is directed to the corresponding output port.

By default a split operator is configured with the Unmatched condition, which is always available in the sequence as the last condition. You cannot add your own condition to the Unmatched condition. You also cannot delete the Unmatched condition.

The operator evaluates the conditions one at a time. After all conditions in the sequence have been evaluated, data that does not meet a condition is directed to the Unmatched output port.

Split example

Suppose you have the data entity BANK, with the attributes BANK_ID and BANK_NAME.

You set up two split conditions. The full sequence, including the Unmatched condition, is as follows:

Condition Output PortCondition
CONDITION1SPLIT_1.BANK.BANK_ID<102
CONDITION2SPLIT_1.BANK.BANK_ID<104
UNMATCHEDThe default UNMATCHED condition directs all data that does not meet the other conditions in the sequence to the UNMATCHED output port
BANK Data Entity

The data entity has four rows.

BANK_IDBANK_NAME
101A Bank 101
102B Bank 102
103C Bank 103
104D Bank 104
Condition1 Output, First Matching Condition

CONDITION1 returns one matching row.

BANK_IDBANK_NAME
101A Bank 101
Condition2 Output, First Matching Condition

CONDITION2 returns two matching rows (from the unmatched rows after CONDITION1).

BANK_IDBANK_NAME
102B Bank 102
103C Bank 103
Unmatched Condition Output, First Matching Condition

UNMATCHED condition returns the remaining row.

BANK_IDBANK_NAME
104D Bank 104
Condition1 Output, All Matching Conditions

CONDITION1 returns one matching row.

BANK_IDBANK_NAME
101A Bank 101
Condition2 Output, All Matching Conditions

All data is evaluated by CONDITION2, returning three matching rows.

BANK_IDBANK_NAME
101A Bank 101
102B Bank 102
103C Bank 103
Unmatched Condition Output, All Matching Conditions

UNMATCHED condition returns the rows that do not meet CONDITION1 and CONDITION2.

BANK_IDBANK_NAME
104D Bank 104
Adding and configuring a split operator

When you add a split operator on the canvas, by default the split operator icon is displayed as expanded, showing the Unmatched split condition. The Unmatched condition directs all data that does not meet the other conditions that you add to the sequence.

  1. From the Operators panel, drag a Split operator onto the canvas.
  2. With the split operator in focus, on the Details tab of the Properties panel, enter a name in the Identifier field, or leave the name as-is.
  3. Under Match, specify how you want to handle incoming data that meets a split condition. You can choose from the following two options:
    • First matching condition: Data that matches the first condition in the sequence is removed from further processing by subsequent conditions.

      For example, consider a sequence with four conditions, including the last Unmatched condition: Condition1, Condition2, Condition3, and Unmatched.

      For Condition1, the operation does not find matches. For Condition2, matches are found. The operation routes the matched data to the output port that's specified for Condition2, and then evaluates the unmatched data for Condition3. If no matches are found for Condtion3, all the data is routed to the output port that's specified for the Unmatched condition. However, if matches are found for Condition3, the operation routes the matched data to the output port for Condition3, and then routes the remaining data to the port for the Unmatched condition.

    • All matching conditions: Data that matches any condition is evaluated for all conditions in the sequence.

      For example, consider a sequence with Condition1 and Condition2. Matches are found after Condition1 is evaluated. The matched data is routed to the output port that's specified for Condition1. All data (matched and unmatched) is then evaluated by Condition2. Data that matches Condition2 is routed to the corresponding output port. Data that does not meet Condition1 and Condition2 is routed to the output port for the Unmatched split condition.

  4. Under Split conditions, click Add condition to add a split condition.
    1. In the panel to add a split condition, edit the Identifier value, or leave it as-is.
    2. In the Condition builder section, you can double-click or drag attributes, parameters, and functions to add to the editor to build a condition. You can also enter a condition expression manually into the editor, and validate the expression.
      Note

      In the editor, an added element such as a function might have placeholders. To replace a placeholder with another element, highlight the placeholder, and then double-click another element from the list.
    3. Click Add.
      The first condition you add is inserted before the Unmatched condition. You can also see the sequence of conditions in the expanded operator icon on the canvas.
    4. Repeat the steps to add conditions to form a sequence.
      The second condition is inserted into the sequence between the first and Unmatched conditions. Subsequent conditions are always inserted before the Unmatched condition. The sequence of split conditions is also reflected in the expanded split operator icon on the canvas.
  5. To connect an output port on a split condition to another operator on the canvas:
    1. Expand the split operator icon to display the conditions sequence.
      If the split operator icon is not expanded, you can still connect the operator to the next operator but the link is created from the port of the Unmatched condition.
    2. Hover over the end of a condition and then drag the small circle on the right side of the condition to the next shaping operator or a target.
    3. Connect each condition to the appropriate operator or target.
    Each condition output port can be linked to multiple shaping operators and target operators. If the split operator icon is not expanded on the canvas, you can hover over a link line to see the condition name of that link.
  6. Under the Attributes tab, select to view incoming attributes or output attributes. Incoming attributes link to the operator on the left side. Output attributes link to the operator on the right side and go into the next operator.

    You can select individual attributes or use the filter icons to filter the list of attributes. Then apply bulk exclude rules to the selected or filtered attributes.

    • Click the filter icon in the Name column to filter attributes by a name pattern. In the filter field, enter a simple regular expression. You can use wildcards ? and * in the regex pattern.
    • Click the filter icon in the Type column to filter attributes by a data type. Use the menu to select the type to use as a filter.
      Note

      You can apply only one name pattern filter, but multiple type filters at a time. For example, to filter by the name pattern *_CODE and by the type numeric or varchar, you apply one name pattern filter (*_CODE) and two type filters (numeric, varchar).
    • Use the Actions menu to apply bulk exclude rules to the selected attributes or filtered attributes. Select Exclude by selection or Exclude by applied filter.
    • Click View rules to open the Rules panel. You can view and manage the rules that have been applied to the data entity. By default, the first rule in the Rules panel includes everything.

    A complex type is shown as ARRAY (data type), COMPOSITE, or MAP (key type, value type). See Hierarchical Data Types to understand what is supported for complex types.

  7. In the Data tab, view a sampling of the data, based on the configuration of the operator and any rules you have applied in the Attributes tab.

    The Conditional output port menu shows the same number of items as the number of conditions configured on the operator, including the Unmatched condition. By default, the output from the first condition port is shown. To change the output view, select the port of the condition output from the menu.

    The Data Profile does not appear for complex data attributes. For complex data attributes, to view the hierarchy of the data structure in a simplified structure, click the complex data shown. For example, [...] or {…}.

    You can apply transformations to individual attributes or do a bulk transformation on a group of attributes.

    For entities with hierarchical data types, see Hierarchical Data Types to understand what is supported.

  8. In the Validation tab, check for warnings or errors that might cause the data flow to fail.
Adding a split condition

Use the Condition Builder to visually select elements to create and add a split condition. You can also enter a split condition manually in the editor.

You can add split conditions to an existing sequence of conditions. Split conditions are added to the end of the sequence, before the Unmatched condition. You cannot add your own condition to the Unmatched condition.

The elements that you can use in a split condition include incoming attributes, parameters, and functions. You can double-click or drag an element from the list to add to the editor to build a condition. You can validate the condition before creating it.

Incoming displays the attributes from the upstream port. For example:

SPLIT_1.BANK.BANK_NAME='ABC Bank'

Parameters are the expression parameters that have been added to the data flow using the Condition Builder (filter, join, lookup, and split operators) or Expression Builder (expression and aggregate operators). An expression parameter has a name, type, and default value. See Adding an Expression Parameter.

Suppose you create a VARCHAR parameter with the name P_VARCHAR_NAME, and set the default value to ABC BANK. Then you can use the parameter in a split condition as:

SPLIT_1.BANK.BANK_NAME=$P_VARCHAR_NAME

Functions are the functions available in Data Integration that you can use in a condition. Functions are operations performed on arguments passed to the function. Functions calculate, manipulate, or extract data values from arguments.

Suppose you create a VARCHAR parameter with the name P_VARCHAR_LIKE, and set the default value to B%. Then you can use the parameter in a split condition as:

SPLIT_1.BANK.BANK_NAME LIKE $P_VARCHAR_LIKE

Here's a list of functions that are available for you to add when you construct conditions:

Hash Functions
FunctionDescriptionExample
MD5(all data types)Calculates an MD5 checksum of the data type, and returns a string value. MD5(column_name)
SHA1(all data types)Calculates a SHA-1 hash value of the data type, and returns a string value. SHA1(column_name)
SHA2(all data types, bitLength)Calculates a SHA-2 hash value of the data type, and returns a string value. bitLength is an integer. SHA2 (column_name, bitLength can be set to 0 (equivalent to 256), 256, 384, or 512).
ORA_HASH(expr, [max_bucket], [seed_value])

Computes a hash value for expr, and returns a NUMBER value.

expr can be an expression, a column, a literal.

max_bucket is the maximum bucket value returned, between 0 and 4294967295 (default).

seed_value is a value between 0 (default) and 4294967295.

Oracle applies the hash function to the combination of expr and seed_value to produce many different results for the same set of data.

ORA_HASH('1')

ORA_HASH('b', 2)

ORA_HASH(100, 10, 10)

ORA_HASH(EXPRESSION_1.CUSTOMERS.SSN, 2)

Arithmetic Functions
FunctionDescriptionExample
ABS(numeric)Returns the absolute power of the numeric value.ABS(-1)
CEIL(numeric)Returns the smallest integer not greater than the numeric valueCEIL(-1,2)
FLOOR(numeric)Returns the largest integer not greater than the numeric value.FLOOR(-1,2)
MOD(numeric1, numeric2)Returns the remainder after numeric1 is divided by numeric2.MOD(8,2)
POWER(numeric1, numeric2)Raises numeric1 to the power of numeric2.POWER(2,3)
ROUND(numeric1, numeric2)Returns numeric1 rounded to numeric2 decimal places.ROUND(2.5,0)
TRUNC(numeric1, numeric2)Returns numeric1 truncated to numeric2 decimal places.TRUNC(2.5,0)
TO_NUMBER(expr[, format, locale])Converts an expr to a number, based on the format and optional locale provided. Default locale is en-US. Supported language tags.

Supported format patterns:

  • 0: A digit
  • #: A digit, zero shows as absent
  • .: Placeholder for decimal separator
  • ,: Placeholder for grouping separator
  • E: Separates mantissa and exponent for exponential formats
  • -: Default negative prefix
  • ¤: Currency sign; replaced by currency symbol; if doubled, replaced by international currency symbol; if present in a pattern, the monetary decimal separator is used instead of the decimal separator

TO_NUMBER('5467.12') returns returns 5467.12

TO_NUMBER('-USD45,677.7', '¤¤##,###.#', 'en-US') returns -45677.7

Date and Time Functions
Function Description Example
CURRENT_DATE Returns the current date. CURRENT_DATEreturns today's date such as 2023-05-26
CURRENT_TIMESTAMP Returns the current date and time for the session time zone. CURRENT_TIMESTAMP returns today's date and current time such as 2023-05-26 12:34:56
DATE_ADD(date, number_of_days) Returns the date that's the specified number of days after the specified date. DATE_ADD('2017-07-30', 1) returns 2017-07-31
DATE_FORMAT(expr, format[, locale])

Formats an expr of Date, based on the format and optional locale provided. Default locale is en-US. Supported language tags.

Supported date format patterns:

  • yy: two digit year
  • yyyy: four digit year
  • M: Numeric month, such as 1 for January
  • MM: Numeric month, such as 01 for January
  • MMM: Abbreviated month, such as Jan
  • MMMM: Full month, such as January
  • d: Numeric day of the month, such as 1 for June 1
  • dd: Numeric day of the month, such as 01 for June 1
  • DDD: Numeric day of the year from 001 to 366, such as 002 for January 2
  • F: Numeric day of the week in a month, such as 3 for 3rd Monday in June.
  • EEE or E: Abbreviated named day of the week, such as Sun for Sunday
  • EEEE: Named day of the week, such as Sunday
  • HH: 24 hour format from 00 to 23
  • H: 24 hour format from 0 to 23
  • hh: 12 hour format from 01 to 12
  • h: 12 hour format from 1 to 12
  • mm: minutes from 00 to 59
  • ss: seconds from 00 to 59
  • SSS: milliseconds from 000 to 999
  • a: AM or PM
  • z: time zone such as PDT

DATE_FORMAT(Date '2020-10-11', 'yyyy-MM-dd') returns '2020-10-11'. The first argument is a Date object representing October 11th, 2020.

DATE_FORMAT(Date '2018-junio-17', 'yyyy/MMMM/dd', 'es-ES') returns '2018/junio/17'

DAYOFMONTH(date) Returns the date's day in the month. DAYOFMONTH('2020-12-25') returns 25
DAYOFWEEK(date) Returns the date's day in the week. DAYOFWEEK('2020-12-25') returns 6 for Friday. In the United States, Sunday is considered to be 1, Monday is 2, and so on.
DAYOFYEAR(date) Returns the date's day in the year. DAYOFYEAR('2020-12-25') returns 360
WEEKOFYEAR(date) Returns the date's week in the year.

WEEKOFYEAR('2022-07-28') returns 30

WEEKOFYEAR('2022-07-28 13:24:30') returns 30

HOUR(datetime) Returns the datetime's hour value. HOUR('2020-12-25 15:10:30') returns 15
LAST_DAY(date) Returns the date's last day of the month. LAST_DAY('2020-12-25') returns 31
MINUTE(datetime) Returns the datetime's minute value. HOUR('2020-12-25 15:10:30') returns 10
MONTH(date) Returns the date's month value. MONTH('2020-06-25') returns 6
QUARTER(date) Returns the quarter of year the date is in. QUARTER('2020-12-25') returns 4
SECOND(datetime) Returns the datetime's second value. SECOND('2020-12-25 15:10:30') returns 30
TO_DATE(string, format_string[, localeStr]) Parses the string expression with the format_string expression to a date. Locale is optional. Default is en-US. Supported language tags.

In pipeline expressions, the format_string must use the strftime format codes. Otherwise, the supported case-sensitive format strings are:

  • yy: two digit year
  • yyyy: four digit year
  • M: Numeric month, such as 1 for January
  • MM: Numeric month, such as 01 for January
  • MMM: Abbreviated month, such as Jan
  • MMMM: Full month, such as January
  • d: Numeric day of the month, such as 1 for June 1
  • dd: Numeric day of the month, such as 01 for June 1
  • DDD: Numeric day of the year from 001 to 366, such as 002 for January 2
  • F: Numeric day of the week in a month, such as 3 for 3rd Monday in June.
  • EEE or E: Abbreviated named day of the week, such as Sun for Sunday
  • EEEE: Named day of the week, such as Sunday
  • HH: 24 hour format from 00 to 23
  • H: 24 hour format from 0 to 23
  • hh: 12 hour format from 01 to 12
  • h: 12 hour format from 1 to 12
  • mm: minutes from 00 to 59
  • ss: seconds from 00 to 59
  • SSS: milliseconds from 000 to 999
  • a: AM or PM
  • z: time zone such as PDT

TO_DATE('31 December 2016', 'dd MMMM yyyy') returns a Date value of 2016-12-31

TO_DATE('2018/junio/17', 'yyyy/MMMM/dd', 'es-ES') returns a Date value of 2018-06-17

TO_TIMESTAMP(expr, format_string[, localeStr]) Converts an expr of VARCHAR to a value of TIMESTAMP, based on the format_string and the optional localeStr provided.

In pipeline expressions, the format_string must use the strftime format codes. Otherwise, the supported format patterns are:

  • yy: two digit year
  • yyyy: four digit year
  • M: Numeric month, such as 1 for January
  • MM: Numeric month, such as 01 for January
  • MMM: Abbreviated month, such as Jan
  • MMMM: Full month, such as January
  • d: Numeric day of the month, such as 1 for June 1
  • dd: Numeric day of the month, such as 01 for June 1
  • DDD: Numeric day of the year from 001 to 366, such as 002 for January 2
  • F: Numeric day of the week in a month, such as 3 for 3rd Monday in June.
  • EEE or E: Abbreviated named day of the week, such as Sun for Sunday
  • EEEE: Named day of the week, such as Sunday
  • HH: 24 hour format from 00 to 23
  • H: 24 hour format from 0 to 23
  • hh: 12 hour format from 01 to 12
  • h: 12 hour format from 1 to 12
  • mm: minutes from 00 to 59
  • ss: seconds from 00 to 59
  • SSS: milliseconds from 000 to 999
  • a: AM or PM
  • z: time zone such as PDT
TO_TIMESTAMP('2020-10-11 11:10:10', 'yyyy-MM-dd HH:mm:ss') returns a TIMESTAMP object representing 11am 10:10 Oct 11th, 2020
WEEK(date)

Returns the date's week value.

WEEK('2020-06-25') returns 4
YEAR(date) Returns the date's year value. YEAR('2020-06-25') returns 2020
ADD_MONTHS(date_expr, number_months) Returns the date after adding the specified number of months to the specified date, timestamp or string with a format such as yyyy-MM-dd or yyyy-MM-dd HH:mm:ss.SSS.

ADD_MONTHS('2017-07-30', 1) returns 2017-08-30

ADD_MONTHS('2017-07-30 09:07:21', 1) returns 2017-08-30

MONTHS_BETWEEN(start_date_expr, end_date_expr)

Returns the number of months between start_date_expr and end_date_expr. start_date_expr and end_date_expr can be a date, timestamp or string with a format such as yyyy-MM-dd or yyyy-MM-dd HH:mm:ss.SSS

A whole number is returned if both dates are the same day of the month, or both are the last day in their respective months. Otherwise, the difference is calculated based on 31 days per month.

MONTHS_BETWEEN('2022-01-01', '2022-01-31') returns 1

MONTHS_BETWEEN('2022-07-28', '2020-07-25') returns 24

MONTHS_BETWEEN('2022-07-28 13:24:30', '2020-07-25 13:24:30') returns 24

FROM_UTC_TIMESTAMP(time_stamp, time_zone)

Interprets a date, timestamp or string as a UTC time and converts that time to a timestamp in the specified time zone.

For string, use a format such as: yyyy-MM-dd or yyyy-MM-dd HH:mm:ss.SSS

Time zone format is either a region-based zone ID (for example, 'area/city' such as 'Asia/Seoul', or a time zone offset (for example, UTC+02).

FROM_UTC_TIMESTAMP('2017-07-14 02:40:00.0', 'GMT+1') returns 2017-07-14 03:40:00.0
TO_UTC_TIMESTAMP(time_stamp, time_zone)

Converts a date, timestamp or string in the specified time zone to a UTC timestamp.

For string, use a format such as: yyyy-MM-dd or yyyy-MM-dd HH:mm:ss.SSS

Time zone format is either a region-based zone ID (for example, 'area/city' such as 'Asia/Seoul'), or a time zone offset (for example, UTC+02).

TO_UTC_TIMESTAMP('2017-07-14 02:40:00.0', 'GMT+1') returns 2017-07-14 01:40:00.0
FROM_UNIXTIME(unix_time[, fmt])

Converts the specified Unix time or epoch to a string that represents the timestamp of that moment in the current system time zone and in the specified format.

Note: Unix time is the number of seconds that have elapsed since January 1, 1970 00:00:00 UTC.

If fmt is omitted, the default format is yyyy-MM-dd HH:mm:ss

FROM_UNIXTIME(1255033470) returns '2009-10-08 13:24:30'

FROM_UNIXTIME(1637258854) returns '2021-11-18 10:07:34'

Default time zone is PST in the examples

UNIX_TIMESTAMP([time_expr[, fmt]])

Converts the current or specified time to a Unix timestamp in seconds.

time_expr is a date, timestamp or string with a format such as yyyy-MM-dd or yyyy-MM-dd HH:mm:ss.SSS

If time_expr is not provided, the current time is converted.

If time_expr is a string and fmt is omitted, the default is yyyy-MM-dd HH:mm:ss

UNIX_TIMESTAMP('1970-01-01 00:00:00', 'yyyy-MM-dd HH:mm:ss') returns 28800

Default time zone is PST in this example

INTERVAL 'year' YEAR[(year_precision)]

Returns a period of time in years.

year_precision is the number of digits in the year field; it ranges from 0 to 9. If year_precision is omitted, the default is 2 (must be less than 100 years.)

INTERVAL '1' YEAR returns an interval of 1 year

INTERVAL '200' YEAR(3) returns an interval of 200 years

INTERVAL 'year month' YEAR[(year_precision)] TO MONTH

Returns a period of time in years and months. Use to store a period of time using year and month fields.

year_precision is the number of digits in the year field; it ranges from 0 to 9. If year_precision is omitted, the default is 2 (must be less than 100 years.)

INTERVAL '100-5' YEAR(3) TO MONTH returns an interval of 100 years, 5 months. Must must specify the leading year precision of 3.
INTERVAL 'month' MONTH[(month_precision)]

Returns a period of time in months.

month_precision is the number of digits in the month field; it ranges from 0 to 9. If month_precision is omitted, the default is 2 (must be less than 100 years.)

INTERVAL '200' MONTH(3) returns an interval of 200 months. Must specify the month precision of 3.
INTERVAL 'day time' DAY[(day_precision)] TO SECOND[(fractional_seconds_precision)]

Returns a period of time in terms of days, hours, minutes, and seconds.

day_precision is the number of digits in the day field; it ranges from 0 to 9. The default is 2.

fractional_seconds_precision is the number of digits in the fractional part of the second value in the time field; it ranges from 0 to 9.

INTERVAL '11 10:09:08.555' DAY TO SECOND(3) returns an interval of 11 days, 10 hours, 09 minutes, 08 seconds, and 555 thousandths of a second
INTERVAL 'day time' DAY[(day_precision)] TO MINUTE[(minute_precision)]

Returns a period of time in terms of days, hours, and minutes.

day_precision is the number of digits in the day field; it ranges from 0 to 9. The default is 2.

minute_precision is the number of digits in the minute field; it ranges from 0 to 2. The default is 2.

INTERVAL '11 10:09' DAY TO MINUTE returns an interval of 11 days, 10 hours, and 09 minutes
INTERVAL 'day time' DAY[(day_precision)] TO HOUR[(hour_precision)]

Returns a period of time in terms of days and hours.

day_precision is the number of digits in the day field; it ranges from 0 to 9. The default is 2.

hour_precision is the number of digits in the hour field; it ranges from 0 to 2. The default is 2.

INTERVAL '100 10' DAY(3) TO HOUR returns an interval of 100 days 10 hours
INTERVAL 'day' DAY[(day_precision)]

Returns a period of time in terms of days.

day_precision is the number of digits in the day field; it ranges from 0 to 9. The default is 2.

INTERVAL '999' DAY(3) returns an interval of 999 days
INTERVAL 'time' HOUR[(hour_precision)] TO SECOND[(fractional_seconds_precision)]

Returns a period of time in terms of hours, minutes, and seconds.

hour_precision is the number of digits in the hour field; it ranges from 0 to 2. The default is 2.

fractional_seconds_precision is the number of digits in the fractional part of the second value in the time field; it ranges from 0 to 9.

INTERVAL '09:08:07.6666666' HOUR TO SECOND(7) returns an interval of 9 hours, 08 minutes, and 7.6666666 seconds
INTERVAL 'time' HOUR[(hour_precision)] TO MINUTE[(minute_precision)]

Returns a period of time in terms of hours and minutes.

hour_precision is the number of digits in the hour field; it ranges from 0 to 2. The default is 2.

minute_precision is the number of digits in the minute field; it ranges from 0 to 2. The default is 2.

INTERVAL '09:30' HOUR TO MINUTE returns an interval of 9 hours and 30 minutes
INTERVAL 'hour' HOUR[(hour_precision)]

Returns a period of time in terms of hours.

hour_precision is the number of digits in the hour field; it ranges from 0 to 2. The default is 2.

INTERVAL '40' HOUR returns an interval of 40 hours
INTERVAL 'minute' MINUTE[(minute_precision)]

Returns a period of time in terms of minutes.

minute_precision is the number of digits in the minute field; it ranges from 0 to 2. The default is 2.

INTERVAL '15' MINUTE returns an interval of 15 minutes
INTERVAL 'time' MINUTE[(minute_precision)] TO SECOND[(fractional_seconds_precision)]

Returns a period of time in terms of minutes and seconds.

minute_precision is the number of digits in the minute field; it ranges from 0 to 2. The default is 2.

fractional_seconds_precision is the number of digits in the fractional part of the second value in the time field; it ranges from 0 to 9.

INTERVAL '15:30' MINUTE TO SECOND returns an interval of 15 minutes and 30 seconds
INTERVAL 'second' SECOND[(fractional_seconds_precision)]

Returns a period of time in terms of seconds.

fractional_seconds_precision is the number of digits in the fractional part of the second field; it ranges from 0 to 9. The default is 3.

INTERVAL '15.678' SECOND returns an interval of 15.678 seconds
Analytic Functions
Function Description Example
FIRST_VALUE(value) OVER ([ partition_clause ] order_by_clause [ windowFrame_clause ] ) Returns the value evaluated at the row that's the first row of the window frame. FIRST_VALUE(BANK_ID) OVER (PARTITION BY BANK_ID ORDER BY BANK_NAME ROWS BETWEEN CURRENT ROW AND 1 FOLLOWING) returns the first value of BANK_ID in a window over which the rows are computed as the current row and 1 row after that row, partitioned by BANK_ID and in ascending order of BANK_NAME.
LAG(value[, offset[, default]]) OVER ([ partition_clause ] order_by_clause) Returns the value evaluated at the row at a given offset before the current row within the partition. If there is no such row, the default value is returned. Both offset and default are evaluated with respect to the current row. If omitted, offset is defaulted to 1 and default to NULL. LAG(BANK_ID, 2, 'hello') OVER (PARTITION BY BANK_ID ORDER BY BANK_NAME DESC) returns the value of BANK_ID from the second row before the current row, partitioned by BANK_ID and in descending order of BANK_NAME. If there is no such value, hello is returned.
LAST_VALUE(value) OVER ([ partition_clause ] order_by_clause [ windowFrame_clause ]) Returns the value evaluated at the row that is the last row of the window frame. LAST_VALUE(BANK_ID) OVER (PARTITION BY BANK_ID ORDER BY BANK_NAME ROWS BETWEEN CURRENT ROW AND 1 FOLLOWING) returns the last value of BANK_ID in a window over which the rows are computed as the current row and 1 row after that row, partitioned by BANK_ID and in ascending order of BANK_NAME.
LEAD(value[, offset[, default]]) OVER ([ partition_clause ] order_by_clause) Returns the value evaluated at the row at a given offset after the current row within the partition. If there is no such row, the default value is returned. Both offset and default are evaluated with respect to the current row. If omitted, offset is defaulted to 1 and default to NULL. LEAD(BANK_ID, 2, 'hello') OVER (PARTITION BY BANK_ID ORDER BY BANK_NAME) returns the value of BANK_ID from the second row after the current row, partitioned by BANK_ID and in ascending order of BANK_NAME. If there is no such value, hello is returned.
RANK() OVER([ partition_clause ] order_by_clause) Returns the rank of the current row with gaps, counting from 1. RANK() OVER (PARTITION BY BANK_ID ORDER BY BANK_NAME) returns the rank of each row within the partition group of BANK_ID, in ascending order of BANK_NAME.
ROW_NUMBER() OVER([ partition_clause ] order_by_clause) Returns the unique number of the current row within its partition, counting from 1. ROW_NUMBER() OVER (PARTITION BY BANK_ID ORDER BY BANK_NAME) returns the unique row number of each row within the partition group of BANK_ID, in ascending order of BANK_NAME.
String Functions
FunctionsDescriptionExample
CAST(value AS type)Returns the specified value in the specified type.CAST("10" AS INT) returns 10
CONCAT(string, string)Returns the combined values of strings or columns.CONCAT('Oracle','SQL') returns OracleSQL
CONCAT_WS(separator, expression1, expression2, expression3,...) Returns the combined values of strings or columns using the specified separator in between the strings or columns.

A separator is required and it must be a string.

At least one expression must be provided after the separator. For example: CONCAT_WS(',' col1)

CONCAT_WS('-', 'Hello', 'Oracle') returns Hello-Oracle

CONCAT_WS(' ', address, city, postal_code) returns 123 MyCity 987654

If a child of the function is an array, then the array is flattened:

CONCAT_WS(',', 1,2,3, to_array(4,5,6), to_array(7,8), 9) returns 1,2,3,4,5,6,7,8,9

INITCAP(string)Returns the string with the first letter in each word capitalized, while all other letters are in lowercase, and each word is delimited by a white space.INITCAP('oRACLE sql') returns Oracle Sql
INSTR(string, substring[start_position])Returns the (1-based) index of the first occurrence of substring in string.INSTR('OracleSQL', 'SQL') returns 7
LOWER(string)Returns the string with all letters changed to lowercase.LOWER('ORACLE') returns oracle
LENGTH(string)Returns the character length of string or the number of bytes of binary data. The length of the string includes trailing spaces.LENGTH('Oracle') returns 6
LTRIM(string)Returns the string with leading spaces removed from the left.LTRIM(' Oracle')
NVL(expr1, epxr2)Returns the argument that is not null.NVL(EXPRESSION_3.CUSTOMERS_JSON.CONTINENT_ID, ROWID())
REGEXP_SUBSTR(string, regexp[, RegexGroupIdx])Searches and extracts the string that matches a regular expression pattern from the input string. If the optional capturing group index is provided, the function extracts the specific group.

REGEXP_SUBSTR('https://www.oracle.com/products', 'https://([[:alnum:]]+\.?){3,4}/?') returns https://www.oracle.com

REGEXP_SUBSTR('22;33;44', '([0-9.]*);([0-9.]*);([0-9.]*)', 1) returns 22
REPLACE(string, search, replacement)Replaces all occurrences of search with replacement.

If search is not found in string, then string is returned unchanged.

If replacement is not specified or is an empty string, nothing replaces the search that is removed from string.

REPLACE('ABCabc', 'abc', 'DEF') returns ABCDEF
RTRIM(string)Returns the string with leading spaces removed from the right.RTRIM('Oracle ')
SUBSTRING(string, position[, substring_length])Returns the substring starting at position. SUBSTRING('Oracle SQL' FROM 2 FOR 3) returns rac
For numbers, TO_CHAR(expr) and for dates TO_CHAR(expr, format[, locale])Converts numbers and dates into strings. For numbers, no format is required. For dates, use the same format as DATE_FORMAT described in Date and Time Functions. Default locale is en-US. See supported language tags.

In pipeline expressions, the format_string must use the strftime format codes. Otherwise, the supported date format patterns are:

  • yy: two digit year
  • yyyy: four digit year
  • M: Numeric month, such as 1 for January
  • MM: Numeric month, such as 01 for January
  • MMM: Abbreviated month, such as Jan
  • MMMM: Full month, such as January
  • d: Numeric day of the month, such as 1 for June 1
  • dd: Numeric day of the month, such as 01 for June 1
  • DDD: Numeric day of the year from 001 to 366, such as 002 for January 2
  • F: Numeric day of the week in a month, such as 3 for 3rd Monday in June.
  • EEE or E: Abbreviated named day of the week, such as Sun for Sunday
  • EEEE: Named day of the week, such as Sunday
  • HH: 24 hour format from 00 to 23
  • H: 24 hour format from 0 to 23
  • hh: 12 hour format from 01 to 12
  • h: 12 hour format from 1 to 12
  • mm: minutes from 00 to 59
  • ss: seconds from 00 to 59
  • SSS: milliseconds from 000 to 999
  • a: AM or PM
  • z: time zone such as PDT

Number example: TO_CHAR(123) returns 123

Date example: TO_CHAR(Date '2020-10-30', 'yyyy.MM.dd', 'en-US') returns the string 2020.10.30. The first argument is a Date object representing Oct 30th, 2020.

UPPER(string)Returns a string with all letters changed to uppercase.UPPER('oracle') returns ORACLE
LPAD(str, len[, pad])Returns a string that is left-padded with specified characters to a certain length. If the pad character is omitted, the default is a space.LPAD('ABC', 5, '*') returns '**ABC'
RPAD(str, len[, pad])Returns a string that is right-padded with specified characters to a certain length. If the pad character is omitted, the default is a space.RPAD('XYZ', 6, '+' ) returns 'XYZ+++'
Operator (Comparison) Functions
FunctionDescriptionExample
CASE WHEN condition1 THEN result1 ELSE result2 ENDReturns the value for which a condition is met.CASE WHEN 1 > 0 THEN 'ABC' ELSE 'XYZ' END returns ABC if 1> 0, otherwise returns XYZ
ANDThe logical AND operator. Returns true if both operands are true, otherwise returns false.(x = 10 AND y = 20) returns "true" if x is equal to 10 and y is equal to 20. If either one is not true, then it returns "false"
ORThe logical OR operator. Returns true if either operand is true or both are true, otherwise returns false.(x = 10 OR y = 20) returns "false" if x is not equal to 10 and also y is not equal to 20. If either one is true, then it returns "true"
NOTThe logical NOT operator.
LIKEPerforms string pattern matching, whether string1 matches the pattern in string2.
=Tests for equality. Returns true if expr1 equals expr2, otherwise returns false.x = 10 returns "true" when value of x is 10, else it returns "false"
!=Tests for inequality. Returns true if expr1 does not equal to expr2, otherwise returns false.x != 10 returns "false" if value of x is 10, else it returns "true"
>Tests for an expression greater than. Returns true if expr1 is greater than expr2.x > 10 returns "true" if value of x is greater than 10, else it returns "false"
>=Tests for an expression greater than or equal to. Returns true if expr1 is greater than or equal to expr2.x > =10 returns "true" if value of x is greater than or equal to 10, else it returns "false"
<Tests for an expression less than. Returns true if expr1 is less than expr2.x < 10 returns "true" if value of x is less than 10, else it returns "false"
<=Tests for an expression less than or equal to. Returns true if expr1 is less than or equal to expr2.x <= 10 returns "true" if value of x is less than 10, else it returns "false"
||Concatenates two strings.'XYZ' || 'hello' returns 'XYZhello'
BETWEENEvaluates a range.FILTER_1.BANK.BANK_ID BETWEEN 1003 AND 1007
INTests whether an expression matches a list of values.FILTER_2.ORDERS.ORDER_ID IN (1003, 1007)
Unique ID Functions
FunctionDescriptionExample
NUMERIC_ID()Generates a universally unique identifier that is a 64-bit number for each row.NUMERIC_ID() returns for example, 3458761969522180096 and 3458762008176885761
ROWID()Generates monotonically increasing 64-bit numbers.ROWID() returns for example, 0, 1, 2, and so on
UUID()Generates a universally unique identifier that is a 128-bit String for each row.UUID() returns for example, 20d45c2f-0d56-4356-8910-162f4f40fb6d
MONOTONICALLY_INCREASING_ID() Generates unique, monotonically increasing 64-bit integers that are not consecutive numbers. MONOTONICALLY_INCREASING_ID() returns for example, 8589934592 and 25769803776
Conditional Functions
FunctionDescriptionExample
COALESCE(value, value [, value]*)Returns the first non-null argument, if it exists, otherwise returns null.COALESCE(NULL, 1, NULL) returns 1
NULLIF(value, value)Returns null if the two values equal each other, otherwise returns the first value.NULLIF('ABC','XYZ') returns ABC
Hierarchical Functions
FunctionDescriptionExample
SCHEMA_OF_JSON(string)Parses a JSON string and infers the schema in DDL format.

SCHEMA_OF_JSON('[{\"Zipcode\":704,\"ZipCodeType\":\"STANDARD\",\"City\":\"ORACLECITY\",\"State\":\"OC\"}]') returns 'ARRAY<STRUCT<City:string,State:string,ZipCodeType:string,Zipcode:bigint>>'

SCHEMA_OF_JSON('[{\"col\":0}]') returns 'ARRAY<STRUCT<col: BIGINT>>'

FROM_JSON(column, string)

Parses a column containing a JSON string into one of the following types, with the specified schema.

  • Map, with String as the key type
  • Struct
  • Array

FROM_JSON('{\"Zipcode\":704,\"City\":\"ORACLE CITY\"}', 'STRUCT<Zipcode: BIGINT, City: STRING>') returns a Struct type column with the specified schema: {704, ORACLE CITY}

FROM_JSON('{\"a\":1, \"b\":0.8}', 'STRUCT<a: BIGINT, b: DOUBLE>') returns a Struct type column with the specified schema: {1, 0.8}

TO_JSON(column)Converts a column containing a type of Struct or Array of Structs, or a Map or Array of Map, into a JSON string.TO_JSON(TO_STRUCT('s1', TO_ARRAY(1,2,3), 's2', TO_MAP('key', 'value'))) returns a JSON string {"s1":[1,2,3],"s2":{"key":"value"}}
TO_MAP(string,column[,string,column]*)Creates a new column of Map type. The input columns must be grouped as key-value pairs. The input key columns cannot be null, and must all have the same data type. The input value columns must all have the same data type.

TO_MAP('Ename',Expression_1.attribute1) returns a Map type column: {"ENAME" -> 100}

TO_MAP('block', EXPRESSION_1.MYSOURCE.address.block, 'unit', EXPRESSION_1.MYSOURCE.address.unit) returns a Map type column: {"block" -> 1,"unit" -> 1}

TO_STRUCT(string,column[,string,column]*)Creates a new column of Struct type. The input columns must be grouped as key-value pairs.

TO_STRUCT('Ename',Expression_1.attribute1) returns {100}

TO_STRUCT('Id',Expression_1.attribute1, 'Name', Expression_1.attribute2) returns {100, "John"}

TO_ARRAY(column[,column]*)Creates a new column as Array type. The input columns must all have the same data type.

TO_Array(Expression_1.attribute1) returns [100]

TO_ARRAY(EXPRESSION_1.attribute2,EXPRESSION_1.attribute3) returns ["John","Friend"]

Higher-Order Functions

Data flow operators that support creating expressions and hierarchical data types can use higher-order functions.

The supported operators are:

  • Aggregate

  • Expression

  • Filter

  • Join

  • Lookup

  • Split

  • Pivot

FunctionDescriptionExample
TRANSFORM(column, lambda_function)Takes an array and an anonymous function, and sets up a new array by applying the function to each element, and assigning the result to the output array.For an input array of integers [1, 2, 3], TRANSFORM(array, x -> x + 1) returns a new array of [2, 3, 4].
TRANSFORM_KEYS(column, lambda_function)Takes a map and a function with 2 arguments (key and value), and returns a map in which the keys have the type of the result of the lambda function, and the values have the type of the column map values.For an input map with integer keys and string values of {1 -> 'value1', 2 -> 'value2', 3 -> 'value3'}, TRANSFORM_KEYS(map, (k, v) -> k * 2 + 1) returns a new map of {3 -> 'value1', 5 -> 'value2', 7 -> 'value3'}.
TRANSFORM_VALUES(column, lambda_function)Takes a map and a function with 2 arguments (key and value), and returns a map in which the values have the type of the result of the lambda functions, and the keys have the type of the column map keys. For an input map with string keys and string values of {'a' -> 'value1', 'b' -> 'value2', 'c' -> 'value3'}, TRANSFORM_VALUES(map, (k, v) -> k || __ '' || v) returns a new map of {'a' -> 'a_value1', 'b' -> 'b_value2', 'c' -> 'c_value3'}.
ARRAY_SORT(array(...), lambda_function)

Only the Expression operator supports ARRAY_SORT.

Takes an array and sorts according to the given function that takes 2 arguments.

The function must return -1, 0, or 1 depending on whether the first element is less than, equal to, or greater than the second element.

If the function is omitted, the array is sorted in ascending order.

array_sort(to_array(5, 6, 1),
                   (left, right) -> CASE WHEN left < right THEN -1
                                         WHEN left > right THEN 1 ELSE 0 END)

The returned array is:

[1,5,6]
Editing split conditions

You can edit any split condition except the Unmatched condition.

  1. On the designer canvas, select the split operator of the condition you want to change.
    The Details tab for the split operator displays in the Properties panel. On the canvas, you can expand and collapse the operator icon to show and hide the existing split conditions.
  2. In the Details tab, under Split conditions, select Edit from the Actions menu (Actions menu) next to the split condition you want to change.
  3. In the Edit split condition panel, edit the Identifier value, or leave it as-is.
  4. In the Condition builder section, you can double-click to select, or drag incoming attributes or functions to the editor to build the condition. You can also edit the condition expression manually in the editor, and validate the condition before saving it.
  5. Click Save changes.
Changing the sequence of split conditions

You can move a split condition up or down in the sequence. Only the Unmatched condition cannot be moved.

  1. On the designer canvas, select the split operator that has the sequence of conditions you want to change.
    The Details tab for the split operator displays in the Properties panel. On the canvas, you can expand and collapse the operator icon to show and hide the existing split conditions.
  2. In the Details tab, under Split conditions, review the sequence of split conditions.
  3. Next to a condition row, from the Actions menu (Actions menu), select the action to move that condition up or down the sequence.
    Notice that the condition position in the expanded split operator icon on the canvas changes accordingly when you move a condition.
Deleting split conditions

You can delete any split condition except the Unmatched condition.

  1. On the designer canvas, select the split operator of the condition you want to delete.
    The Details tab for the split operator displays in the Properties panel. On the canvas, you can expand and collapse the operator icon to show and hide the split conditions.
  2. In the Details tab, under Split conditions, select Delete from the Actions menu (Actions menu) next to the condition you want to delete.
  3. To delete two or more conditions, use the column next to the Sequence column to select the conditions, and then click Delete.
  4. If the output port on a deleted condition was connected to another operator or target, ensure that you review the data flow and make the necessary changes after deleting conditions.

Pivot Operator

The pivot operator lets you take unique row values from one attribute in an input source and pivot the values into multiple attributes in the output.

Using input from multiple rows, the pivot operation performs a transformation based on aggregate function expressions, and the values of an attribute that you specify as the pivot key. The result of a pivot operation is a pivoted or rearranged output of rows and attributes.

The number of rows in the output is based on the selection of attributes to group by.

  • If you specify one or more attributes to group by, the incoming rows that have the same group by attribute value is grouped into one row. For example, if you specify one group by attribute that has four unique values, the incoming data is transformed and grouped into four rows in the output.
  • If you don't specify any attributes to group by, all the incoming data is transformed into a single output row.

The number of attributes in the output is:

  • Based on the number of attributes you select to group by
  • A multiple of the number of values you select in the pivot key
  • A result of the number of attributes that the aggregate function expressions transform

For example, if you select one group by attribute and three pivot key values, and add an expression that transforms two attributes, the number of attributes in the output is:

1 + (3 * 2)

The total number of attributes in the resulting pivoted output is calculated as follows:

Number of group by attributes + (Number of pivot key values * Number of attributes that are transformed from expressions)

The names of new attributes in the output are derived from a pattern you add for target attributes when you create the expressions.

The aggregate functions you use with a pivot operator determine the pivoted values in the output. If no data is found, null is inserted where a pivoted value is expected.

Pivot example

Consider the data entity PRODUCT_SALES, that has the attributes STORE, PRODUCT, and SALES. The data entity has five rows. You want to create a pivot on the attribute PRODUCT with an aggregate SUM function on SALES.

You specify to group pivoted rows by the attribute STORE. Each unique STORE value becomes a row in the resulting output. Input rows with the same group by value are grouped into the same row in the pivoted output. If you don't specify a group by attribute, all input rows are transformed into a single row in the resulting output.

You specify PRODUCT as the pivot key, and select all three values to pivot into new attributes in the resulting output.

The aggregate SUM function expression on SALES is:

SUM(PIVOT_1_1.PRODUCT_SALES.SALES)

The pattern for target attributes is:

%PIVOT_KEY_VALUE%
Data Entity PRODUCT_SALES
STOREPRODUCTSALES
AB StoreTelevision2
AB StoreTelevision4
Country-StoreTelevision6
Country-StoreRefrigerator8
E-StoreCoffee maker10
Pivot Output: Group By STORE, Pivot Key PRODUCT
STORETELEVISIONREFRIGERATORCOFFEE MAKER
AB-Store6--
Country-Store68-
E-Store--10
Pivot Output: Without Group By, Pivot Key PRODUCT
TELEVISIONREFRIGERATORCOFFEE MAKER
12810
Adding and configuring a pivot operator

The pivot operator performs a transformation using one or more aggregate function expressions on one or more values of an attribute that you specify as the pivot key.

You can choose to group the pivoted rows into a single row or select attributes to create multiple output rows based on the same group by value.

  1. From the Operators panel, drag a Pivot operator onto the canvas.
  2. With the pivot operator in focus, on the Details tab of the Properties panel, enter a name in the Identifier field, or leave the name as-is.
  3. (Optional) For Group by, specify the attributes whose distinct values are used to arrange the pivoted data in grouped rows. By default, all incoming rows are aggregated and grouped into one row unless you specify one or more attributes to group the rows by.
    • Select Attributes and then select one or more attributes to group pivoted rows by the same value. For example, if you select an attribute that has two unique values, the resulting output has two rows of aggregated data.
    • Select Pattern to use a regular expression pattern to select the attributes to group by. Only attributes that match the pattern are used to group pivoted rows by the same value.
  4. For Pivot key, select the attribute whose values can be used as new attributes in the pivoted output. Only attributes that are of type VARCHAR, INTEGER, FLOAT, or BOOLEAN can be used as the pivot key.
  5. Under Pivot key values, click Add key values to select one or more values from the selected pivot key that you want to become pivoted attributes in the output. Up to 50 values can be selected.

    You can use an alias for a selected pivot key value. An alias cannot start with a number. Only the _ special character is allowed.

    To use an alias for a selected pivot key value, select Edit from the Actions menu (Actions menu) for that key value. In the Edit Pivot Key Alias panel, enter the alias for the key value.

    To delete a pivot key value from the selected list, select the checkbox next to it and click Delete. Alternatively, you can select Delete from the Actions menu (Actions menu).

  6. Under Expressions, click Add expression to add the aggregate function expressions. See Adding a pivot expression.
  7. Under the Attributes tab, select to view incoming attributes or output attributes. Incoming attributes link to the operator on the left side. Output attributes link to the operator on the right side and go into the next operator.

    You can select individual attributes or use the filter icons to filter the list of attributes. Then apply bulk exclude rules to the selected or filtered attributes.

    • Click the filter icon in the Name column to filter attributes by a name pattern. In the filter field, enter a simple regular expression. You can use wildcards ? and * in the regex pattern.
    • Click the filter icon in the Type column to filter attributes by a data type. Use the menu to select the type to use as a filter.
      Note

      You can apply only one name pattern filter, but multiple type filters at a time. For example, to filter by the name pattern *_CODE and by the type numeric or varchar, you apply one name pattern filter (*_CODE) and two type filters (numeric, varchar).
    • Use the Actions menu to apply bulk exclude rules to the selected attributes or filtered attributes. Select Exclude by selection or Exclude by applied filter.
    • Click View rules to open the Rules panel. You can view and manage the rules that have been applied to the data entity. By default, the first rule in the Rules panel includes everything.

    A complex type is shown as ARRAY (data type), COMPOSITE, or MAP (key type, value type). See Hierarchical Data Types to understand what is supported for complex types.

  8. In the Data tab, view a sampling of the data, based on the configuration of the operator and any rules you have applied in the Attributes tab.

    The Data Profile does not appear for complex data attributes. For complex data attributes, to view the hierarchy of the data structure in a simplified structure, click the complex data shown. For example, [...] or {…}.

    You can apply transformations to individual attributes or do a bulk transformation on a group of attributes.

    For entities with hierarchical data types, see Hierarchical Data Types to understand what is supported.

  9. In the Validation tab, check for warnings or errors that might cause the data flow to fail.
Adding a pivot expression

One or more aggregate function expressions are required for a pivot operator.

  1. With the pivot operator selected on the data flow canvas, under Expressions in the properties panel, click Add expression.

  2. In the Add Expression panel, enter a name for the expression in the Identifier field, or leave the name as-is.

  3. (Optional) To use a pattern to apply an expression to multiple source attributes, select Allow bulk selection.

    For example, suppose you have two DISCOUNT_ attributes in the dataset (DISCOUNT_VALUE and DISCOUNT_RATE) and you want to apply the MAX function to both of them.

    1. Under Source attributes, select Pattern and click Add pattern.

    2. In the Add Source Pattern panel, add a pattern for selecting a group of source attributes that have names beginning with DISCOUNT_. For example, enter DISCOUNT* and click Add. Then select the Data type.

    3. Under Target attributes, use a pattern for the resulting output attribute names.

      By default, the pattern %MACRO_INPUT%_%PIVOT_KEY_VALUE% is already inserted for you. %MACRO_INPUT% corresponds to the names of the sources attributes as selected by the pattern you added. %PIVOT_KEY_VALUE% corresponds to the values as selected in the pivot key.

      For example, if %PIVOT_KEY_VALUE% indicates TELEVISION, and %MACRO_INPUT% indicates DISCOUNT_VALUE and DISCOUNT_RATE, then in the output, the pivoted attributes are <pivot_name>.<expression_name>.DISCOUNT_VALUE_TELEVISION and <pivot_name>.<expression_name>.DISCOUNT_RATE_TELEVISION.

    4. You can retain the Use source attribute data types selection. Otherwise, under Expression data type, select the Data type and complete the fields corresponding to the type selected.
  4. If you did not select Allow bulk selection, under Target attributes, use a pattern for the resulting output attribute names.

    By default, the pattern %PIVOT_KEY_VALUE% is already inserted for you. %PIVOT_KEY_VALUE% corresponds to the values as selected in the pivot key.

    For example, if %PIVOT_KEY_VALUE% indicates TELEVISION and TELEPHONE, then in the output, the pivoted attributes are <pivot_name>.<expression_name>.TELEVISION and <pivot_name>.<expression_name>.TELEPHONE.

    Under Expression data type, select the Data type and complete the fields corresponding to the type selected.

  5. In the Expression builder section, double-click or drag incoming attributes, parameters, and aggregate functions to add to the editor to build the expression. You can also manually write the expression yourself, and validate the expression.

    The following table shows the list of aggregate functions that are available for constructing pivot expressions.

    FunctionDescriptionExample
    COUNT(value[, value]*)Returns the number of rows for which one or more supplied expressions are all non-null.COUNT(expr1)
    COUNT(*)Returns the total number of retrieved rows, including rows containing null.COUNT(*)
    MAX(value)Returns the maximum value of the argument.MAX(expr)
    MIN(value)Returns the minimum value of the argument.MIN(expr)
    SUM(numeric)Returns the sum calculated from values of a group.SUM(expr1)

    You can also use higher-order (transform) functions in a pivot expression.

  6. To construct a pivot expression, specify the attribute or attributes and the aggregate function.

    • If you selected Allow bulk selection, then use %MACRO_INPUT% in the expression to indicate the attributes on which the function is to be applied.

      For example, if you used the pattern DISCOUNT* to match source attributes DISCOUNT_RATE and DISCOUNT_VALUE, then you can specify an aggregate function such as SUM(numeric) to apply the function on all the attributes that match the pattern. %MACRO_INPUT% replaces the numeric placeholder in the function:

      SUM(%MACRO_INPUT%)
    • If you did not select Allow bulk selection, then specify the attribute in the function.

      For example, the data entity is PRODUCT_SALES and you want to use an aggregate SUM function on the SALES attribute. You can specify the function such as SUM(numeric), replacing the numeric placeholder in the function with the attribute name:

      SUM(PIVOT_1_1.PRODUCT_SALES.SALES)
    • You can use an expression parameter for the aggregate function name in the pivot expression. An expression parameter has a name, type, and default value.

      For example, the expression parameter P_VARCHAR has type VARCHAR and MIN as the default value. You can specify the aggregate function as:

      $P_VARCHAR(%MACRO_INPUT%)
      $P_VARCHAR(PIVOT_1_1.PRODUCT_SALES.SALES)
  7. In the Add Expression panel, click Add.

Lookup Operator

The lookup operator performs a query and transformation using a lookup condition and input from two sources, a primary input source and a lookup input source.

The query operation uses the condition and a value in the primary input to find rows in the lookup source. The transformation appends attributes from the lookup source to the primary source.

You can specify the action to take when multiple rows and no rows are returned by the lookup query. For example, you can specify that the action is to skip non-matching rows, and to return any one matching row when there are multiple matching rows.

The resulting output is a combination of both input sources based on the lookup condition, a value in the primary input, and the preferred actions to take. The primary input determines the order of attributes and rows in the output, with the attributes from the primary input placed before the attributes from the lookup input.

Lookup example

Consider two source data entities in a data flow. Data entity 1 (PAYMENTS) is set as the primary input. Data entity 2 (CUSTOMERS) is set as the lookup input. The lookup condition is set as:

LOOKUP_1_1.PAYMENTS.CUSTOMER_ID = LOOKUP_1_2.CUSTOMERS.CUSTOMER_ID

The resulting lookup output shows how the data from the two inputs are combined and transformed. Attributes from the lookup source are appended to the primary source attributes, with the following behaviors:

  • If the operation does not find a matching record for a value in the lookup source, the record is returned with null inserted for the lookup attributes. For example, no matched records are found for the CUSTOMER_ID values 103, 104, and 105. Therefore, in the resulting output, null is populated in the appended attributes CUSTOMER_ID and NAME.
  • If the operation finds multiple matching records for a value in the lookup source, any one matching record is returned.
Data Entity 1, primary input source
PAYMENT_IDCUSTOMER_IDAMOUNT
11012500
21021110
3103500
4104400
5105150
6102450
Data Entity 2, lookup input source
CUSTOMER_IDNAME
101Peter
102Paul
106Mary
102Pauline
Lookup Output
PAYMENT_IDCUSTOMER_IDAMOUNTCUSTOMER_IDNAME
11012500101Peter
21021110102Paul
3103500nullnull
4104400nullnull
5105150nullnull
6102450102Paul
Adding and configuring a lookup operator

A lookup operator uses two input sources on a data flow.

The following procedure assumes you have added and configured two source operators.

  1. From the Operators panel, drag a Lookup operator onto the canvas.
  2. With the lookup operator in focus, on the Details tab of the Properties panel, enter a name in the Identifier field, or leave the name as-is.
  3. For Primary input, select the source that contains a value for matching one or more rows in the lookup source. Matching is based on a lookup condition, which you create in step 5.
    The primary input also determines the order of attributes and rows in the resulting lookup output.
  4. For Lookup input, select the source that the primary source references to match one or more rows.
    Attributes from the lookup input are appended to the primary input attributes in the resulting output.
  5. For Multiple match, select the action to perform when multiple matching rows are found between the primary source and the lookup source. The action you select determines the rows to use in the resulting output.
    • Return any one matching row: Default action. Selects any one matching row.
    • Return the first matching row: Selects the first row from the rows that match.
    • Return the last matching row: Selects the last row from the rows that match.
    • Return error: Displays an error. Does not complete the operation.
    • Return all matching rows: Selects all matching rows.
  6. For No match, select the action to performs when no matching row is found between the primary source and the lookup source. The action you select determines how to handle the primary source row that does not have a matching row in the lookup source.
    • Skip non-matching rows: Does not include non-matching rows in the resulting output.
    • Return non-matching rows: Default action. Includes non-matching rows in the resulting output with null values.
  7. Next to Lookup condition, click Create.
    1. In the Create Lookup Condition panel, Condition Builder section, you can double-click or drag attributes, parameters, and functions to add to the editor to build a condition. You can also enter the condition expression manually in the editor, and validate the expression.
      Note

      In the editor, an added element such as a function might have placeholders. To replace a placeholder with another element, highlight the placeholder, and then double-click another element from the list.
    2. Click Create.
  8. (Optional) Click Assign parameter to use parameters so that the lookup condition is not bound to the compiled code when you publish the data flow. See Assigning a Parameter.
  9. Under the Attributes tab, select to view incoming attributes or output attributes. Incoming attributes link to the operator on the left side. Output attributes link to the operator on the right side and go into the next operator.

    You can select individual attributes or use the filter icons to filter the list of attributes. Then apply bulk exclude rules to the selected or filtered attributes.

    • Click the filter icon in the Name column to filter attributes by a name pattern. In the filter field, enter a simple regular expression. You can use wildcards ? and * in the regex pattern.
    • Click the filter icon in the Type column to filter attributes by a data type. Use the menu to select the type to use as a filter.
      Note

      You can apply only one name pattern filter, but multiple type filters at a time. For example, to filter by the name pattern *_CODE and by the type numeric or varchar, you apply one name pattern filter (*_CODE) and two type filters (numeric, varchar).
    • Use the Actions menu to apply bulk exclude rules to the selected attributes or filtered attributes. Select Exclude by selection or Exclude by applied filter.
    • Click View rules to open the Rules panel. You can view and manage the rules that have been applied to the data entity. By default, the first rule in the Rules panel includes everything.

    A complex type is shown as ARRAY (data type), COMPOSITE, or MAP (key type, value type). See Hierarchical Data Types to understand what is supported for complex types.

  10. In the Data tab, view a sampling of the data, based on the configuration of the operator and any rules you have applied in the Attributes tab.

    The Data Profile does not appear for complex data attributes. For complex data attributes, to view the hierarchy of the data structure in a simplified structure, click the complex data shown. For example, [...] or {…}.

    You can apply transformations to individual attributes or do a bulk transformation on a group of attributes.

    For entities with hierarchical data types, see Hierarchical Data Types to understand what is supported.

  11. In the Validation tab, check for warnings or errors that might cause the data flow to fail.
Creating a lookup condition

Use the Condition Builder to visually select elements to create a lookup condition. You can also enter a condition manually in the editor.

A lookup condition lets you use a value in a primary input source to look up records in a lookup input source, returning any rows that match. Rows that do not have a match are returned with null values.

The elements that you can use in a lookup condition include incoming attributes, parameters, and functions. You can double-click or drag an element from the list to add to the editor to build a condition. You can validate the condition before creating it.

Incoming displays the attributes from the upstream input ports in two separate LOOKUP folders. View the attributes from each port by expanding or collapsing the appropriate LOOKUP folder. For example, LOOKUP_1_1 is the primary input, LOOKUP_1_2 is the lookup input, lookup conditions based on a value in the primary input attribute ADDRESS_ID could be:

LOOKUP_1_1.BANK_ADDRESS.ADDRESS_ID = '2001'
LOOKUP_1_1.BANK_ADDRESS.ADDRESS_ID = LOOKUP_1_2.BANK_CUSTOMER.ADDRESS_ID

Parameters are the expression parameters that have been added to the data flow using the Condition Builder (filter, join, lookup, and split operators) or Expression Builder (expression and aggregate operators). An expression parameter has a name, type, and default value. See Adding an Expression Parameter.

Suppose you want to look up bank customers of a specific bank. You can create a VARCHAR parameter with the name P_LOOK_UP, and set the default value to 2001, which is the specific bank value. Then you can create the lookup condition as:

LOOKUP_1_1.BANK_ADDRESS.ADDRESS_ID = LOOKUP_1_2.BANK_CUSTOMER.ADDRESS_ID AND LOOKUP_1_1.BANK_ADDRESS.ADDRESS_ID = $P_LOOK_UP

Functions are the functions available in Data Integration that you can use in a condition. Functions are operations performed on arguments passed to the function. Functions calculate, manipulate, or extract data values from arguments.

Here's a list of functions that are available for you to add when you construct conditions:

Hash Functions
FunctionDescriptionExample
MD5(all data types)Calculates an MD5 checksum of the data type, and returns a string value. MD5(column_name)
SHA1(all data types)Calculates a SHA-1 hash value of the data type, and returns a string value. SHA1(column_name)
SHA2(all data types, bitLength)Calculates a SHA-2 hash value of the data type, and returns a string value. bitLength is an integer. SHA2 (column_name, bitLength can be set to 0 (equivalent to 256), 256, 384, or 512).
ORA_HASH(expr, [max_bucket], [seed_value])

Computes a hash value for expr, and returns a NUMBER value.

expr can be an expression, a column, a literal.

max_bucket is the maximum bucket value returned, between 0 and 4294967295 (default).

seed_value is a value between 0 (default) and 4294967295.

Oracle applies the hash function to the combination of expr and seed_value to produce many different results for the same set of data.

ORA_HASH('1')

ORA_HASH('b', 2)

ORA_HASH(100, 10, 10)

ORA_HASH(EXPRESSION_1.CUSTOMERS.SSN, 2)

Arithmetic Functions
FunctionDescriptionExample
ABS(numeric)Returns the absolute power of the numeric value.ABS(-1)
CEIL(numeric)Returns the smallest integer not greater than the numeric valueCEIL(-1,2)
FLOOR(numeric)Returns the largest integer not greater than the numeric value.FLOOR(-1,2)
MOD(numeric1, numeric2)Returns the remainder after numeric1 is divided by numeric2.MOD(8,2)
POWER(numeric1, numeric2)Raises numeric1 to the power of numeric2.POWER(2,3)
ROUND(numeric1, numeric2)Returns numeric1 rounded to numeric2 decimal places.ROUND(2.5,0)
TRUNC(numeric1, numeric2)Returns numeric1 truncated to numeric2 decimal places.TRUNC(2.5,0)
TO_NUMBER(expr[, format, locale])Converts an expr to a number, based on the format and optional locale provided. Default locale is en-US. Supported language tags.

Supported format patterns:

  • 0: A digit
  • #: A digit, zero shows as absent
  • .: Placeholder for decimal separator
  • ,: Placeholder for grouping separator
  • E: Separates mantissa and exponent for exponential formats
  • -: Default negative prefix
  • ¤: Currency sign; replaced by currency symbol; if doubled, replaced by international currency symbol; if present in a pattern, the monetary decimal separator is used instead of the decimal separator

TO_NUMBER('5467.12') returns returns 5467.12

TO_NUMBER('-USD45,677.7', '¤¤##,###.#', 'en-US') returns -45677.7

Date and Time Functions
Function Description Example
CURRENT_DATE Returns the current date. CURRENT_DATEreturns today's date such as 2023-05-26
CURRENT_TIMESTAMP Returns the current date and time for the session time zone. CURRENT_TIMESTAMP returns today's date and current time such as 2023-05-26 12:34:56
DATE_ADD(date, number_of_days) Returns the date that's the specified number of days after the specified date. DATE_ADD('2017-07-30', 1) returns 2017-07-31
DATE_FORMAT(expr, format[, locale])

Formats an expr of Date, based on the format and optional locale provided. Default locale is en-US. Supported language tags.

Supported date format patterns:

  • yy: two digit year
  • yyyy: four digit year
  • M: Numeric month, such as 1 for January
  • MM: Numeric month, such as 01 for January
  • MMM: Abbreviated month, such as Jan
  • MMMM: Full month, such as January
  • d: Numeric day of the month, such as 1 for June 1
  • dd: Numeric day of the month, such as 01 for June 1
  • DDD: Numeric day of the year from 001 to 366, such as 002 for January 2
  • F: Numeric day of the week in a month, such as 3 for 3rd Monday in June.
  • EEE or E: Abbreviated named day of the week, such as Sun for Sunday
  • EEEE: Named day of the week, such as Sunday
  • HH: 24 hour format from 00 to 23
  • H: 24 hour format from 0 to 23
  • hh: 12 hour format from 01 to 12
  • h: 12 hour format from 1 to 12
  • mm: minutes from 00 to 59
  • ss: seconds from 00 to 59
  • SSS: milliseconds from 000 to 999
  • a: AM or PM
  • z: time zone such as PDT

DATE_FORMAT(Date '2020-10-11', 'yyyy-MM-dd') returns '2020-10-11'. The first argument is a Date object representing October 11th, 2020.

DATE_FORMAT(Date '2018-junio-17', 'yyyy/MMMM/dd', 'es-ES') returns '2018/junio/17'

DAYOFMONTH(date) Returns the date's day in the month. DAYOFMONTH('2020-12-25') returns 25
DAYOFWEEK(date) Returns the date's day in the week. DAYOFWEEK('2020-12-25') returns 6 for Friday. In the United States, Sunday is considered to be 1, Monday is 2, and so on.
DAYOFYEAR(date) Returns the date's day in the year. DAYOFYEAR('2020-12-25') returns 360
WEEKOFYEAR(date) Returns the date's week in the year.

WEEKOFYEAR('2022-07-28') returns 30

WEEKOFYEAR('2022-07-28 13:24:30') returns 30

HOUR(datetime) Returns the datetime's hour value. HOUR('2020-12-25 15:10:30') returns 15
LAST_DAY(date) Returns the date's last day of the month. LAST_DAY('2020-12-25') returns 31
MINUTE(datetime) Returns the datetime's minute value. HOUR('2020-12-25 15:10:30') returns 10
MONTH(date) Returns the date's month value. MONTH('2020-06-25') returns 6
QUARTER(date) Returns the quarter of year the date is in. QUARTER('2020-12-25') returns 4
SECOND(datetime) Returns the datetime's second value. SECOND('2020-12-25 15:10:30') returns 30
TO_DATE(string, format_string[, localeStr]) Parses the string expression with the format_string expression to a date. Locale is optional. Default is en-US. Supported language tags.

In pipeline expressions, the format_string must use the strftime format codes. Otherwise, the supported case-sensitive format strings are:

  • yy: two digit year
  • yyyy: four digit year
  • M: Numeric month, such as 1 for January
  • MM: Numeric month, such as 01 for January
  • MMM: Abbreviated month, such as Jan
  • MMMM: Full month, such as January
  • d: Numeric day of the month, such as 1 for June 1
  • dd: Numeric day of the month, such as 01 for June 1
  • DDD: Numeric day of the year from 001 to 366, such as 002 for January 2
  • F: Numeric day of the week in a month, such as 3 for 3rd Monday in June.
  • EEE or E: Abbreviated named day of the week, such as Sun for Sunday
  • EEEE: Named day of the week, such as Sunday
  • HH: 24 hour format from 00 to 23
  • H: 24 hour format from 0 to 23
  • hh: 12 hour format from 01 to 12
  • h: 12 hour format from 1 to 12
  • mm: minutes from 00 to 59
  • ss: seconds from 00 to 59
  • SSS: milliseconds from 000 to 999
  • a: AM or PM
  • z: time zone such as PDT

TO_DATE('31 December 2016', 'dd MMMM yyyy') returns a Date value of 2016-12-31

TO_DATE('2018/junio/17', 'yyyy/MMMM/dd', 'es-ES') returns a Date value of 2018-06-17

TO_TIMESTAMP(expr, format_string[, localeStr]) Converts an expr of VARCHAR to a value of TIMESTAMP, based on the format_string and the optional localeStr provided.

In pipeline expressions, the format_string must use the strftime format codes. Otherwise, the supported format patterns are:

  • yy: two digit year
  • yyyy: four digit year
  • M: Numeric month, such as 1 for January
  • MM: Numeric month, such as 01 for January
  • MMM: Abbreviated month, such as Jan
  • MMMM: Full month, such as January
  • d: Numeric day of the month, such as 1 for June 1
  • dd: Numeric day of the month, such as 01 for June 1
  • DDD: Numeric day of the year from 001 to 366, such as 002 for January 2
  • F: Numeric day of the week in a month, such as 3 for 3rd Monday in June.
  • EEE or E: Abbreviated named day of the week, such as Sun for Sunday
  • EEEE: Named day of the week, such as Sunday
  • HH: 24 hour format from 00 to 23
  • H: 24 hour format from 0 to 23
  • hh: 12 hour format from 01 to 12
  • h: 12 hour format from 1 to 12
  • mm: minutes from 00 to 59
  • ss: seconds from 00 to 59
  • SSS: milliseconds from 000 to 999
  • a: AM or PM
  • z: time zone such as PDT
TO_TIMESTAMP('2020-10-11 11:10:10', 'yyyy-MM-dd HH:mm:ss') returns a TIMESTAMP object representing 11am 10:10 Oct 11th, 2020
WEEK(date)

Returns the date's week value.

WEEK('2020-06-25') returns 4
YEAR(date) Returns the date's year value. YEAR('2020-06-25') returns 2020
ADD_MONTHS(date_expr, number_months) Returns the date after adding the specified number of months to the specified date, timestamp or string with a format such as yyyy-MM-dd or yyyy-MM-dd HH:mm:ss.SSS.

ADD_MONTHS('2017-07-30', 1) returns 2017-08-30

ADD_MONTHS('2017-07-30 09:07:21', 1) returns 2017-08-30

MONTHS_BETWEEN(start_date_expr, end_date_expr)

Returns the number of months between start_date_expr and end_date_expr. start_date_expr and end_date_expr can be a date, timestamp or string with a format such as yyyy-MM-dd or yyyy-MM-dd HH:mm:ss.SSS

A whole number is returned if both dates are the same day of the month, or both are the last day in their respective months. Otherwise, the difference is calculated based on 31 days per month.

MONTHS_BETWEEN('2022-01-01', '2022-01-31') returns 1

MONTHS_BETWEEN('2022-07-28', '2020-07-25') returns 24

MONTHS_BETWEEN('2022-07-28 13:24:30', '2020-07-25 13:24:30') returns 24

FROM_UTC_TIMESTAMP(time_stamp, time_zone)

Interprets a date, timestamp or string as a UTC time and converts that time to a timestamp in the specified time zone.

For string, use a format such as: yyyy-MM-dd or yyyy-MM-dd HH:mm:ss.SSS

Time zone format is either a region-based zone ID (for example, 'area/city' such as 'Asia/Seoul', or a time zone offset (for example, UTC+02).

FROM_UTC_TIMESTAMP('2017-07-14 02:40:00.0', 'GMT+1') returns 2017-07-14 03:40:00.0
TO_UTC_TIMESTAMP(time_stamp, time_zone)

Converts a date, timestamp or string in the specified time zone to a UTC timestamp.

For string, use a format such as: yyyy-MM-dd or yyyy-MM-dd HH:mm:ss.SSS

Time zone format is either a region-based zone ID (for example, 'area/city' such as 'Asia/Seoul'), or a time zone offset (for example, UTC+02).

TO_UTC_TIMESTAMP('2017-07-14 02:40:00.0', 'GMT+1') returns 2017-07-14 01:40:00.0
FROM_UNIXTIME(unix_time[, fmt])

Converts the specified Unix time or epoch to a string that represents the timestamp of that moment in the current system time zone and in the specified format.

Note: Unix time is the number of seconds that have elapsed since January 1, 1970 00:00:00 UTC.

If fmt is omitted, the default format is yyyy-MM-dd HH:mm:ss

FROM_UNIXTIME(1255033470) returns '2009-10-08 13:24:30'

FROM_UNIXTIME(1637258854) returns '2021-11-18 10:07:34'

Default time zone is PST in the examples

UNIX_TIMESTAMP([time_expr[, fmt]])

Converts the current or specified time to a Unix timestamp in seconds.

time_expr is a date, timestamp or string with a format such as yyyy-MM-dd or yyyy-MM-dd HH:mm:ss.SSS

If time_expr is not provided, the current time is converted.

If time_expr is a string and fmt is omitted, the default is yyyy-MM-dd HH:mm:ss

UNIX_TIMESTAMP('1970-01-01 00:00:00', 'yyyy-MM-dd HH:mm:ss') returns 28800

Default time zone is PST in this example

INTERVAL 'year' YEAR[(year_precision)]

Returns a period of time in years.

year_precision is the number of digits in the year field; it ranges from 0 to 9. If year_precision is omitted, the default is 2 (must be less than 100 years.)

INTERVAL '1' YEAR returns an interval of 1 year

INTERVAL '200' YEAR(3) returns an interval of 200 years

INTERVAL 'year month' YEAR[(year_precision)] TO MONTH

Returns a period of time in years and months. Use to store a period of time using year and month fields.

year_precision is the number of digits in the year field; it ranges from 0 to 9. If year_precision is omitted, the default is 2 (must be less than 100 years.)

INTERVAL '100-5' YEAR(3) TO MONTH returns an interval of 100 years, 5 months. Must must specify the leading year precision of 3.
INTERVAL 'month' MONTH[(month_precision)]

Returns a period of time in months.

month_precision is the number of digits in the month field; it ranges from 0 to 9. If month_precision is omitted, the default is 2 (must be less than 100 years.)

INTERVAL '200' MONTH(3) returns an interval of 200 months. Must specify the month precision of 3.
INTERVAL 'day time' DAY[(day_precision)] TO SECOND[(fractional_seconds_precision)]

Returns a period of time in terms of days, hours, minutes, and seconds.

day_precision is the number of digits in the day field; it ranges from 0 to 9. The default is 2.

fractional_seconds_precision is the number of digits in the fractional part of the second value in the time field; it ranges from 0 to 9.

INTERVAL '11 10:09:08.555' DAY TO SECOND(3) returns an interval of 11 days, 10 hours, 09 minutes, 08 seconds, and 555 thousandths of a second
INTERVAL 'day time' DAY[(day_precision)] TO MINUTE[(minute_precision)]

Returns a period of time in terms of days, hours, and minutes.

day_precision is the number of digits in the day field; it ranges from 0 to 9. The default is 2.

minute_precision is the number of digits in the minute field; it ranges from 0 to 2. The default is 2.

INTERVAL '11 10:09' DAY TO MINUTE returns an interval of 11 days, 10 hours, and 09 minutes
INTERVAL 'day time' DAY[(day_precision)] TO HOUR[(hour_precision)]

Returns a period of time in terms of days and hours.

day_precision is the number of digits in the day field; it ranges from 0 to 9. The default is 2.

hour_precision is the number of digits in the hour field; it ranges from 0 to 2. The default is 2.

INTERVAL '100 10' DAY(3) TO HOUR returns an interval of 100 days 10 hours
INTERVAL 'day' DAY[(day_precision)]

Returns a period of time in terms of days.

day_precision is the number of digits in the day field; it ranges from 0 to 9. The default is 2.

INTERVAL '999' DAY(3) returns an interval of 999 days
INTERVAL 'time' HOUR[(hour_precision)] TO SECOND[(fractional_seconds_precision)]

Returns a period of time in terms of hours, minutes, and seconds.

hour_precision is the number of digits in the hour field; it ranges from 0 to 2. The default is 2.

fractional_seconds_precision is the number of digits in the fractional part of the second value in the time field; it ranges from 0 to 9.

INTERVAL '09:08:07.6666666' HOUR TO SECOND(7) returns an interval of 9 hours, 08 minutes, and 7.6666666 seconds
INTERVAL 'time' HOUR[(hour_precision)] TO MINUTE[(minute_precision)]

Returns a period of time in terms of hours and minutes.

hour_precision is the number of digits in the hour field; it ranges from 0 to 2. The default is 2.

minute_precision is the number of digits in the minute field; it ranges from 0 to 2. The default is 2.

INTERVAL '09:30' HOUR TO MINUTE returns an interval of 9 hours and 30 minutes
INTERVAL 'hour' HOUR[(hour_precision)]

Returns a period of time in terms of hours.

hour_precision is the number of digits in the hour field; it ranges from 0 to 2. The default is 2.

INTERVAL '40' HOUR returns an interval of 40 hours
INTERVAL 'minute' MINUTE[(minute_precision)]

Returns a period of time in terms of minutes.

minute_precision is the number of digits in the minute field; it ranges from 0 to 2. The default is 2.

INTERVAL '15' MINUTE returns an interval of 15 minutes
INTERVAL 'time' MINUTE[(minute_precision)] TO SECOND[(fractional_seconds_precision)]

Returns a period of time in terms of minutes and seconds.

minute_precision is the number of digits in the minute field; it ranges from 0 to 2. The default is 2.

fractional_seconds_precision is the number of digits in the fractional part of the second value in the time field; it ranges from 0 to 9.

INTERVAL '15:30' MINUTE TO SECOND returns an interval of 15 minutes and 30 seconds
INTERVAL 'second' SECOND[(fractional_seconds_precision)]

Returns a period of time in terms of seconds.

fractional_seconds_precision is the number of digits in the fractional part of the second field; it ranges from 0 to 9. The default is 3.

INTERVAL '15.678' SECOND returns an interval of 15.678 seconds
Analytic Functions
Function Description Example
FIRST_VALUE(value) OVER ([ partition_clause ] order_by_clause [ windowFrame_clause ] ) Returns the value evaluated at the row that's the first row of the window frame. FIRST_VALUE(BANK_ID) OVER (PARTITION BY BANK_ID ORDER BY BANK_NAME ROWS BETWEEN CURRENT ROW AND 1 FOLLOWING) returns the first value of BANK_ID in a window over which the rows are computed as the current row and 1 row after that row, partitioned by BANK_ID and in ascending order of BANK_NAME.
LAG(value[, offset[, default]]) OVER ([ partition_clause ] order_by_clause) Returns the value evaluated at the row at a given offset before the current row within the partition. If there is no such row, the default value is returned. Both offset and default are evaluated with respect to the current row. If omitted, offset is defaulted to 1 and default to NULL. LAG(BANK_ID, 2, 'hello') OVER (PARTITION BY BANK_ID ORDER BY BANK_NAME DESC) returns the value of BANK_ID from the second row before the current row, partitioned by BANK_ID and in descending order of BANK_NAME. If there is no such value, hello is returned.
LAST_VALUE(value) OVER ([ partition_clause ] order_by_clause [ windowFrame_clause ]) Returns the value evaluated at the row that is the last row of the window frame. LAST_VALUE(BANK_ID) OVER (PARTITION BY BANK_ID ORDER BY BANK_NAME ROWS BETWEEN CURRENT ROW AND 1 FOLLOWING) returns the last value of BANK_ID in a window over which the rows are computed as the current row and 1 row after that row, partitioned by BANK_ID and in ascending order of BANK_NAME.
LEAD(value[, offset[, default]]) OVER ([ partition_clause ] order_by_clause) Returns the value evaluated at the row at a given offset after the current row within the partition. If there is no such row, the default value is returned. Both offset and default are evaluated with respect to the current row. If omitted, offset is defaulted to 1 and default to NULL. LEAD(BANK_ID, 2, 'hello') OVER (PARTITION BY BANK_ID ORDER BY BANK_NAME) returns the value of BANK_ID from the second row after the current row, partitioned by BANK_ID and in ascending order of BANK_NAME. If there is no such value, hello is returned.
RANK() OVER([ partition_clause ] order_by_clause) Returns the rank of the current row with gaps, counting from 1. RANK() OVER (PARTITION BY BANK_ID ORDER BY BANK_NAME) returns the rank of each row within the partition group of BANK_ID, in ascending order of BANK_NAME.
ROW_NUMBER() OVER([ partition_clause ] order_by_clause) Returns the unique number of the current row within its partition, counting from 1. ROW_NUMBER() OVER (PARTITION BY BANK_ID ORDER BY BANK_NAME) returns the unique row number of each row within the partition group of BANK_ID, in ascending order of BANK_NAME.
String Functions
FunctionsDescriptionExample
CAST(value AS type)Returns the specified value in the specified type.CAST("10" AS INT) returns 10
CONCAT(string, string)Returns the combined values of strings or columns.CONCAT('Oracle','SQL') returns OracleSQL
CONCAT_WS(separator, expression1, expression2, expression3,...) Returns the combined values of strings or columns using the specified separator in between the strings or columns.

A separator is required and it must be a string.

At least one expression must be provided after the separator. For example: CONCAT_WS(',' col1)

CONCAT_WS('-', 'Hello', 'Oracle') returns Hello-Oracle

CONCAT_WS(' ', address, city, postal_code) returns 123 MyCity 987654

If a child of the function is an array, then the array is flattened:

CONCAT_WS(',', 1,2,3, to_array(4,5,6), to_array(7,8), 9) returns 1,2,3,4,5,6,7,8,9

INITCAP(string)Returns the string with the first letter in each word capitalized, while all other letters are in lowercase, and each word is delimited by a white space.INITCAP('oRACLE sql') returns Oracle Sql
INSTR(string, substring[start_position])Returns the (1-based) index of the first occurrence of substring in string.INSTR('OracleSQL', 'SQL') returns 7
LOWER(string)Returns the string with all letters changed to lowercase.LOWER('ORACLE') returns oracle
LENGTH(string)Returns the character length of string or the number of bytes of binary data. The length of the string includes trailing spaces.LENGTH('Oracle') returns 6
LTRIM(string)Returns the string with leading spaces removed from the left.LTRIM(' Oracle')
NVL(expr1, epxr2)Returns the argument that is not null.NVL(EXPRESSION_3.CUSTOMERS_JSON.CONTINENT_ID, ROWID())
REGEXP_SUBSTR(string, regexp[, RegexGroupIdx])Searches and extracts the string that matches a regular expression pattern from the input string. If the optional capturing group index is provided, the function extracts the specific group.

REGEXP_SUBSTR('https://www.oracle.com/products', 'https://([[:alnum:]]+\.?){3,4}/?') returns https://www.oracle.com

REGEXP_SUBSTR('22;33;44', '([0-9.]*);([0-9.]*);([0-9.]*)', 1) returns 22
REPLACE(string, search, replacement)Replaces all occurrences of search with replacement.

If search is not found in string, then string is returned unchanged.

If replacement is not specified or is an empty string, nothing replaces the search that is removed from string.

REPLACE('ABCabc', 'abc', 'DEF') returns ABCDEF
RTRIM(string)Returns the string with leading spaces removed from the right.RTRIM('Oracle ')
SUBSTRING(string, position[, substring_length])Returns the substring starting at position. SUBSTRING('Oracle SQL' FROM 2 FOR 3) returns rac
For numbers, TO_CHAR(expr) and for dates TO_CHAR(expr, format[, locale])Converts numbers and dates into strings. For numbers, no format is required. For dates, use the same format as DATE_FORMAT described in Date and Time Functions. Default locale is en-US. See supported language tags.

In pipeline expressions, the format_string must use the strftime format codes. Otherwise, the supported date format patterns are:

  • yy: two digit year
  • yyyy: four digit year
  • M: Numeric month, such as 1 for January
  • MM: Numeric month, such as 01 for January
  • MMM: Abbreviated month, such as Jan
  • MMMM: Full month, such as January
  • d: Numeric day of the month, such as 1 for June 1
  • dd: Numeric day of the month, such as 01 for June 1
  • DDD: Numeric day of the year from 001 to 366, such as 002 for January 2
  • F: Numeric day of the week in a month, such as 3 for 3rd Monday in June.
  • EEE or E: Abbreviated named day of the week, such as Sun for Sunday
  • EEEE: Named day of the week, such as Sunday
  • HH: 24 hour format from 00 to 23
  • H: 24 hour format from 0 to 23
  • hh: 12 hour format from 01 to 12
  • h: 12 hour format from 1 to 12
  • mm: minutes from 00 to 59
  • ss: seconds from 00 to 59
  • SSS: milliseconds from 000 to 999
  • a: AM or PM
  • z: time zone such as PDT

Number example: TO_CHAR(123) returns 123

Date example: TO_CHAR(Date '2020-10-30', 'yyyy.MM.dd', 'en-US') returns the string 2020.10.30. The first argument is a Date object representing Oct 30th, 2020.

UPPER(string)Returns a string with all letters changed to uppercase.UPPER('oracle') returns ORACLE
LPAD(str, len[, pad])Returns a string that is left-padded with specified characters to a certain length. If the pad character is omitted, the default is a space.LPAD('ABC', 5, '*') returns '**ABC'
RPAD(str, len[, pad])Returns a string that is right-padded with specified characters to a certain length. If the pad character is omitted, the default is a space.RPAD('XYZ', 6, '+' ) returns 'XYZ+++'
Operator (Comparison) Functions
FunctionDescriptionExample
CASE WHEN condition1 THEN result1 ELSE result2 ENDReturns the value for which a condition is met.CASE WHEN 1 > 0 THEN 'ABC' ELSE 'XYZ' END returns ABC if 1> 0, otherwise returns XYZ
ANDThe logical AND operator. Returns true if both operands are true, otherwise returns false.(x = 10 AND y = 20) returns "true" if x is equal to 10 and y is equal to 20. If either one is not true, then it returns "false"
ORThe logical OR operator. Returns true if either operand is true or both are true, otherwise returns false.(x = 10 OR y = 20) returns "false" if x is not equal to 10 and also y is not equal to 20. If either one is true, then it returns "true"
NOTThe logical NOT operator.
LIKEPerforms string pattern matching, whether string1 matches the pattern in string2.
=Tests for equality. Returns true if expr1 equals expr2, otherwise returns false.x = 10 returns "true" when value of x is 10, else it returns "false"
!=Tests for inequality. Returns true if expr1 does not equal to expr2, otherwise returns false.x != 10 returns "false" if value of x is 10, else it returns "true"
>Tests for an expression greater than. Returns true if expr1 is greater than expr2.x > 10 returns "true" if value of x is greater than 10, else it returns "false"
>=Tests for an expression greater than or equal to. Returns true if expr1 is greater than or equal to expr2.x > =10 returns "true" if value of x is greater than or equal to 10, else it returns "false"
<Tests for an expression less than. Returns true if expr1 is less than expr2.x < 10 returns "true" if value of x is less than 10, else it returns "false"
<=Tests for an expression less than or equal to. Returns true if expr1 is less than or equal to expr2.x <= 10 returns "true" if value of x is less than 10, else it returns "false"
||Concatenates two strings.'XYZ' || 'hello' returns 'XYZhello'
BETWEENEvaluates a range.FILTER_1.BANK.BANK_ID BETWEEN 1003 AND 1007
INTests whether an expression matches a list of values.FILTER_2.ORDERS.ORDER_ID IN (1003, 1007)
Unique ID Functions
FunctionDescriptionExample
NUMERIC_ID()Generates a universally unique identifier that is a 64-bit number for each row.NUMERIC_ID() returns for example, 3458761969522180096 and 3458762008176885761
ROWID()Generates monotonically increasing 64-bit numbers.ROWID() returns for example, 0, 1, 2, and so on
UUID()Generates a universally unique identifier that is a 128-bit String for each row.UUID() returns for example, 20d45c2f-0d56-4356-8910-162f4f40fb6d
MONOTONICALLY_INCREASING_ID() Generates unique, monotonically increasing 64-bit integers that are not consecutive numbers. MONOTONICALLY_INCREASING_ID() returns for example, 8589934592 and 25769803776
Conditional Functions
FunctionDescriptionExample
COALESCE(value, value [, value]*)Returns the first non-null argument, if it exists, otherwise returns null.COALESCE(NULL, 1, NULL) returns 1
NULLIF(value, value)Returns null if the two values equal each other, otherwise returns the first value.NULLIF('ABC','XYZ') returns ABC
Hierarchical Functions
FunctionDescriptionExample
SCHEMA_OF_JSON(string)Parses a JSON string and infers the schema in DDL format.

SCHEMA_OF_JSON('[{\"Zipcode\":704,\"ZipCodeType\":\"STANDARD\",\"City\":\"ORACLECITY\",\"State\":\"OC\"}]') returns 'ARRAY<STRUCT<City:string,State:string,ZipCodeType:string,Zipcode:bigint>>'

SCHEMA_OF_JSON('[{\"col\":0}]') returns 'ARRAY<STRUCT<col: BIGINT>>'

FROM_JSON(column, string)

Parses a column containing a JSON string into one of the following types, with the specified schema.

  • Map, with String as the key type
  • Struct
  • Array

FROM_JSON('{\"Zipcode\":704,\"City\":\"ORACLE CITY\"}', 'STRUCT<Zipcode: BIGINT, City: STRING>') returns a Struct type column with the specified schema: {704, ORACLE CITY}

FROM_JSON('{\"a\":1, \"b\":0.8}', 'STRUCT<a: BIGINT, b: DOUBLE>') returns a Struct type column with the specified schema: {1, 0.8}

TO_JSON(column)Converts a column containing a type of Struct or Array of Structs, or a Map or Array of Map, into a JSON string.TO_JSON(TO_STRUCT('s1', TO_ARRAY(1,2,3), 's2', TO_MAP('key', 'value'))) returns a JSON string {"s1":[1,2,3],"s2":{"key":"value"}}
TO_MAP(string,column[,string,column]*)Creates a new column of Map type. The input columns must be grouped as key-value pairs. The input key columns cannot be null, and must all have the same data type. The input value columns must all have the same data type.

TO_MAP('Ename',Expression_1.attribute1) returns a Map type column: {"ENAME" -> 100}

TO_MAP('block', EXPRESSION_1.MYSOURCE.address.block, 'unit', EXPRESSION_1.MYSOURCE.address.unit) returns a Map type column: {"block" -> 1,"unit" -> 1}

TO_STRUCT(string,column[,string,column]*)Creates a new column of Struct type. The input columns must be grouped as key-value pairs.

TO_STRUCT('Ename',Expression_1.attribute1) returns {100}

TO_STRUCT('Id',Expression_1.attribute1, 'Name', Expression_1.attribute2) returns {100, "John"}

TO_ARRAY(column[,column]*)Creates a new column as Array type. The input columns must all have the same data type.

TO_Array(Expression_1.attribute1) returns [100]

TO_ARRAY(EXPRESSION_1.attribute2,EXPRESSION_1.attribute3) returns ["John","Friend"]

Higher-Order Functions

Data flow operators that support creating expressions and hierarchical data types can use higher-order functions.

The supported operators are:

  • Aggregate

  • Expression

  • Filter

  • Join

  • Lookup

  • Split

  • Pivot

FunctionDescriptionExample
TRANSFORM(column, lambda_function)Takes an array and an anonymous function, and sets up a new array by applying the function to each element, and assigning the result to the output array.For an input array of integers [1, 2, 3], TRANSFORM(array, x -> x + 1) returns a new array of [2, 3, 4].
TRANSFORM_KEYS(column, lambda_function)Takes a map and a function with 2 arguments (key and value), and returns a map in which the keys have the type of the result of the lambda function, and the values have the type of the column map values.For an input map with integer keys and string values of {1 -> 'value1', 2 -> 'value2', 3 -> 'value3'}, TRANSFORM_KEYS(map, (k, v) -> k * 2 + 1) returns a new map of {3 -> 'value1', 5 -> 'value2', 7 -> 'value3'}.
TRANSFORM_VALUES(column, lambda_function)Takes a map and a function with 2 arguments (key and value), and returns a map in which the values have the type of the result of the lambda functions, and the keys have the type of the column map keys. For an input map with string keys and string values of {'a' -> 'value1', 'b' -> 'value2', 'c' -> 'value3'}, TRANSFORM_VALUES(map, (k, v) -> k || __ '' || v) returns a new map of {'a' -> 'a_value1', 'b' -> 'b_value2', 'c' -> 'c_value3'}.
ARRAY_SORT(array(...), lambda_function)

Only the Expression operator supports ARRAY_SORT.

Takes an array and sorts according to the given function that takes 2 arguments.

The function must return -1, 0, or 1 depending on whether the first element is less than, equal to, or greater than the second element.

If the function is omitted, the array is sorted in ascending order.

array_sort(to_array(5, 6, 1),
                   (left, right) -> CASE WHEN left < right THEN -1
                                         WHEN left > right THEN 1 ELSE 0 END)

The returned array is:

[1,5,6]

Function Operator

Use the function operator to invoke Oracle Cloud Infrastructure Functions from within Data Integration.

Note

For complex data structures such as Map, Array and Composite (Struct), only the JSON serialization format is currently supported.

Before you use OCI Functions from within a data flow in Data Integration, ensure that you understand the dependencies, and have completed the prerequisite tasks.

Before you begin

The function you want to invoke must be deployed in OCI Functions. The function can be written in any language.

Use the following task checklist to ensure that you have the required setup and information for using the function operator with OCI Functions.

TaskRequirement
Set up to use and access OCI Functions

This topic assumes that the tenancy and development environment are already set up for deploying functions to OCI Functions.

See Preparing for Functions.

Only functions that are deployed to OCI Functions can be used with the function operator in Data Integration.

Create policies to control the invocation and management of functions.

This topic assumes you or the tenancy administrator have already created the necessary Oracle Cloud Infrastructure policies. See Creating Policies to Control Access to Network and Function-Related Resources.

In a production environment, you might want to restrict users to invoking functions in a specific application, or to invoking a specific function only.

For example, to restrict users to invoking functions in a specific workspace, enter the policy statements in the following format:

allow any-user to use fn-invocation in compartment <compartment-name> where ALL {request.principal.type='disworkspace', request.principal.id='<workspace-ocid>'}

allow any-user to read fn-function in compartment <compartment-name> where ALL {request.principal.type='disworkspace', request.principal.id='<workspace-ocid>'}

To restrict users to invoking a specific function in a workspace:

allow any-user to use fn-invocation in compartment <compartment-name> where ALL {request.principal.type='disworkspace', request.principal.id='<workspace-ocid>', target.function.id='<function-ocid>'}

allow any-user to read fn-function in compartment <compartment-name> where ALL {request.principal.type='disworkspace', request.principal.id='<workspace-ocid>', target.function.id='<function-ocid>'}

See Controlling Access to Invoke and Manage Functions.

Create an application in OCI Functions.

An application is a logical grouping of functions.

In an application, you specify one to three subnets in which to run functions. Functions running in one application are isolated from functions running in another application.

See Creating Applications (in Functions).

Deploy a function in OCI Functions.

For the function in OCI Functions to work with the Data Integration function operator in a data flow, the function must not read or write any identifier with a space character.

When you deploy a function to OCI Functions using the Fn Project CLI, the function is built as a Docker image and pushed to a specified Docker registry.

See Creating and Deploying Functions.

Collect the information needed to use the deployed function with the function operator.

When you add and configure a function operator in a Data Integration data flow, you need to know the following:

  • The compartment that has the OCI Functions application in which the function you want to use is deployed
  • The application in OCI Functions that has the deployed function
  • The name of the deployed function
  • The fields that are defined in the function
Adding and configuring a function operator

A function operator lets you use a function deployed in OCI Functions to process data from an input source. Primitive and complex data types are supported.

Specify the input shape of the function, and the input and output attributes for the function operator to read and write. Then manually map the source attributes to the input attributes.

Currently, only the JSON serialization format is supported for complex data structures such as Map, Array, and Composite (Struct).

The following procedure assumes you have added and configured a source operator with the data entity you want to use with the function.

  1. From the Operators panel, drag a Function operator onto the canvas.
  2. With the function operator in focus, on the Details tab of the Properties panel, enter a name in the Identifier field, or leave the name as-is.
  3. Select the OCI Function to use for this operator. See Selecting a function.
  4. Select the Serialization format and define the function input and output data.
  5. Specify the Input attributes and Output attributes for the operator, and the Function configuration values for the function. See Adding a function operator property.

    The Input attributes define the input shape for the function operator, which is one or more incoming attributes for the function to work on. Input attributes are mapped to the incoming attributes from the data entity source.

    The Function configuration values define the input shape of the function, which is one or more fields that are defined in the function. Define a function field with a name and a value. The value of a field matches an input attribute. You can parameterize a function configuration value.

    The Output attributes define the output shape for the function operator, which is one or more attributes for the function to write output to. Add an output attribute for each function field that you want in the output. Subsequent downstream operators can use the output attributes.

    Note

    Do not use a space character in the identifier name for an input attribute, an output attribute, or a function field. Also, names such as "Element", "Key", and "Value" are not permitted.
  6. (Optional) Under Function properties, expand Function configuration to review the function configuration values. You can change or parameterize a value.

    BATCH_SIZE is a pre-defined function configuration that represents the number of rows that a function processes at a time. The default is 10,000 rows.

    For a function configuration value, you can:

  7. Under the Attributes tab, select to view incoming attributes or output attributes. Incoming attributes link to the operator on the left side. Output attributes link to the operator on the right side and go into the next operator.

    You can select individual attributes or use the filter icons to filter the list of attributes. Then apply bulk exclude rules to the selected or filtered attributes.

    • Click the filter icon in the Name column to filter attributes by a name pattern. In the filter field, enter a simple regular expression. You can use wildcards ? and * in the regex pattern.
    • Click the filter icon in the Type column to filter attributes by a data type. Use the menu to select the type to use as a filter.
      Note

      You can apply only one name pattern filter, but multiple type filters at a time. For example, to filter by the name pattern *_CODE and by the type numeric or varchar, you apply one name pattern filter (*_CODE) and two type filters (numeric, varchar).
    • Use the Actions menu to apply bulk exclude rules to the selected attributes or filtered attributes. Select Exclude by selection or Exclude by applied filter.
    • Click View rules to open the Rules panel. You can view and manage the rules that have been applied to the data entity. By default, the first rule in the Rules panel includes everything.

    A complex type is shown as ARRAY (data type), COMPOSITE, or MAP (key type, value type). See Hierarchical Data Types to understand what is supported for complex types.

  8. In the Map tab, drag and drop an attribute from the data entity source to the corresponding input attribute on the function to create a manual mapping.
  9. In the Data tab, view a sampling of the data, based on the configuration of the operator and any rules you have applied in the Attributes tab.

    The function output attributes are appended to the data set.

    The Data Profile does not appear for complex data attributes. For complex data attributes, to view the hierarchy of the data structure in a simplified structure, click the complex data shown. For example, [...] or {…}.

    You can apply transformations to individual attributes or do a bulk transformation on a group of attributes.

    For entities with hierarchical data types, see Hierarchical Data Types to understand what is supported.

  10. In the Validation tab, check for warnings or errors that might cause the data flow to fail.
Selecting a function

The function you want to use with a function operator must be deployed in an application in Oracle Cloud Infrastructure Functions.

  1. On the canvas, select the function operator.
  2. Under the Details tab of the Properties panel, for OCI Function, click Select.
  3. In the panel, select the Compartment that has the OCI Functions application in which the function you want to use has been deployed.
  4. Select the Application in OCI Functions that has the deployed function.
  5. In the OCI Function section, select the function.
  6. Click OK.
Adding a function operator property

After selecting a deployed function, specify the properties that define the input data for the function to process, the fields in the function configuration, and the output data that the function returns.

Note

Do not use a space character in the identifier name for an input attribute, an output attribute, or a function field. Also, names such as "Element", "Key", and "Value" are not permitted.
  1. Under Function properties, click Add property.
  2. In the panel, select the Type of property you want to specify.
    • Input attributes: Input shape for the operator. Specify one or more attributes to represent the incoming attributes for the function to process. Input attributes are mapped to the incoming attributes from the data entity source.
    • Function configuration: Specify one or more function fields that define the input shape of the function. A field has a name and a value. The value of a field specifies an input attribute.
    • Output attributes: Output shape for the operator. Specify zero or more attributes to represent the output from the function after the data is processed. Add an output attribute for each function field that you want in the output. Subsequent downstream operators can use the output attributes.
  3. Enter a name for the property in the Identifier field. Do not use a space character in the name for an input attribute, an output attribute, or a function field.
  4. Select the Data type of the property.
  5. For a primitive data type, depending on the Type and Data type of the property you're specifying, specify the following fields, and other applicable fields.

    • Length: For an input or output attribute, enter a length. For example, you specify a length for a numeric or varchar data type.
    • Value: For a function field, enter a value that matches an input attribute.
  6. For a Map complex data type, specify the key-value pair for a map element. Click Add data type to select the key data type and value data type.

    • Key: Select from primitive data types only.
    • Value: Select from primitive and complex data types.
  7. For an Array complex data type, click Add data type to specify the Element data type. You can select from primitive and complex data types.

  8. For a Composite (Struct) data type, specify the schema child or schema children. To add a schema child, click the + symbol next to the parent schema name.

    In the Add Field panel for a schema child, enter the Identifier, and select a Data type. You can select from primitive and complex data types. Click Add to add the schema child.

    Click the + symbol again if you want to add more than one schema child to the parent schema.

    To edit or delete a schema child, use the Actions menu (Actions menu) at the end of the row.

    You cannot delete the parent schema element.

  9. In the Add Property panel, click Add.
  10. Repeat the steps to add the properties you need to define the function input, function fields, and function output.

Flatten Operator

Use the flatten operator to transform hierarchical data into a simplified format for use with other relational data. The flattening process is also known as denormalization or unnesting.

The hierarchical file formats that you can denormalize are:

  • JSON and multi-line JSON
  • Avro
  • Parquet

Currently, the supported hierarchical data type that you can unnest is Array. You can select only one Array node in a hierarchical dataset to flatten. The data structure is flattened from the root to the selected Array node, and presented in a simplified format. If the dataset has other Array nodes, you have the option to convert those into strings.

You can configure the attributes or fields to include in the flattened output by setting Projection preferences.

Understanding projection preferences

Projection preferences are settings for the format, number, and type of projected attributes to include in the output after flattening the incoming data.

Data Integration provides four project preferences settings that are selected by default, resulting in a projected output that has the following:

  • Index arrays
  • All attributes up to the attribute that is selected for flattening
  • Null values for the attributes that are missing from the subsequent arrays
  • Parent name lineage in the attribute names

To understand the projection preferences that you can configure, use the following topics with the JSON data structure example.

JSON data structure example
id
vehicle[]
    make
    model
    insurance{}
        company
        policy_number
    maintenance[]
        date
        description[]
        dealerdetails[]
            servicestation1[]
                dealeraddress[]
                    city
                    state
                    contact
                dealername
            servicestation2[]
                dealeraddress[]
                    city
                    state
                    contact
                dealername
Create and project array indexes

The projection preference setting Create and project array indexes lets you control whether to include index attributes for arrays in the flattened output.

The index attribute represents the index of an array. Suppose that the array is ["a","b","c"], then the index for "a" is 0, the index for "b" is 1, and the index for "c" is 2.

By default, Data Integration creates an attribute with the suffix _INDEX for the array node that you select to flatten data to. The data type of an array index attribute is Integer.

Index attributes are also created for the parent array nodes of the selected array node. The flattening operation impacts all parent array nodes in the hierarchical structure from the root to the selected node. If an impacted parent array node has sibling arrays, index attributes are not created for those sibling array nodes.

In the JSON data structure example, if you select the array dealeraddress that's under servicestation2 for flattening, Data Integration creates five array index attributes: one for the selected array node, and four for the parent nodes impacted by the flattening operation.

After flattening, the attributes in the simplified structure are as follows:

id
vehicle_INDEX
vehicle_make
vehicle_model
vehicle_insurance
    company
    policy_number
vehicle_maintenance_INDEX
vehicle_maintenance_date
vehicle_maintenance_description
vehicle_maintenance_dealerdetails_INDEX
vehicle_maintenance_servicestation1
    dealeraddress
        city
        state
        contact
    dealername
vehicle_maintenance_servicestation2_INDEX
vehicle_maintenance_servicestation2_dealeraddress_INDEX
vehicle_maintenance_servicestation2_dealeraddress_city
vehicle_maintenance_servicestation2_dealeraddress_state
vehicle_maintenance_servicestation2_dealeraddress_contact
vehicle_maintenance_servicestation2_dealeraddress_dealername

An array index attribute is not created for servicestation1 because the array node is a sibling of the impacted parent node servicestation2. Similarly, an index attribute is not created for description because the array node is a sibling of the impacted parent node dealerdetails.

If you do not select the projection preference Create and project array indexes, the attributes in the simplified structure are as follows:

id
vehicle_make
vehicle_model
vehicle_insurance
    company
    policy_number
vehicle_maintenance_date
vehicle_maintenance_description
vehicle_maintenance_servicestation1
    dealeraddress
        city
        state
        contact
    dealername
vehicle_maintenance_servicestation2_dealeraddress_city
vehicle_maintenance_servicestation2_dealeraddress_state
vehicle_maintenance_servicestation2_dealeraddress_contact
vehicle_maintenance_servicestation2_dealeraddress_dealername
Retain all attributes up to the flattened array

The projection preference setting Retain all attributes up to the flattened array lets you control whether to include in the flattened output those attributes that are not part of the flattened arrays.

By default, Data Integration shows all attributes from the root to the selected array, including sibling arrays not impacted by flattening.

In the JSON data structure example, if you do not select the projection preference Retain all attributes up to the flattened array, the attributes in the simplified structure are as follows:

id
vehicle_INDEX
vehicle_maintenance_INDEX
vehicle_maintenance_dealerdetails_INDEX
vehicle_maintenance_servicestation2_INDEX
vehicle_maintenance_servicestation2_dealeraddress_INDEX
vehicle_maintenance_servicestation2_dealeraddress_city
vehicle_maintenance_servicestation2_dealeraddress_state
vehicle_maintenance_servicestation2_dealeraddress_contact
Produce a single row with null values for missing parent array

The projection preference setting Produce a single row with null values for missing parent array lets you control whether to skip rows that do not have the attributes impacted by flattening.

The effect of the Produce a single row with null values for missing parent array setting is seen only in the Data tab. By default, Data Integration shows null values for the attributes that are missing from the subsequent arrays.

For example, the following is a projected output with null values:

|Company1|Redmond |[...]|true |[...]|0 |1 |123.34 |0 |Laptop |20 |
|Company1|Redmond |[...]|true |[...]|0 |1 |123.34 |1 |Charger |2 |
|Company1|Redmond |[...]|true |[...]|1 |2 |323.34 |0 |Mouse |2 |
|Company1|Redmond |[...]|true |[...]|1 |2 |323.34 |1 |Keyboard |1 |
|Company2|Seattle |[...]|false |[...]|0 |4 |123.34 |0 |Laptop |20 |
|Company2|Seattle |[...]|false |[...]|0 |4 |123.34 |1 |Charger |3 |
|Company2|Seattle |[...]|false |[...]|1 |5 |343.24 |0 |Chair |4 |
|Company2|Seattle |[...]|false |[...]|1 |5 |343.24 |1 |Lamp |2 |
|Company3|Bellevue|null |true |[...]|0 |4 |123.34 |null |null |null |
|Company4|Kirkland|null |null |null |null |null |null |null |null |null |

If you do not select the option, then the rows are skipped and the projected output is:

|Company1|Redmond |[...]|true |[...]|0 |1 |123.34 |0 |Laptop |20 |
|Company1|Redmond |[...]|true |[...]|0 |1 |123.34 |1 |Charger |2 |
|Company1|Redmond |[...]|true |[...]|1 |2 |323.34 |0 |Mouse |2 |
|Company1|Redmond |[...]|true |[...]|1 |2 |323.34 |1 |Keyboard |1 |
|Company2|Seattle |[...]|false |[...]|0 |4 |123.34 |0 |Laptop |20 |
|Company2|Seattle |[...]|false |[...]|0 |4 |123.34 |1 |Charger |3 |
|Company2|Seattle |[...]|false |[...]|1 |5 |343.24 |0 |Chair |4 |
|Company2|Seattle |[...]|false |[...]|1 |5 |343.24 |1 |Lamp |2 |
Retain parent name lineage in attribute names

The projection preference setting Retain parent name lineage in attribute names lets you control whether to include the parent names in the child attribute names.

By default, Data Integration creates child node names with their parent node names.

In the JSON data structure example, the attributes with parent names in the simplified structure are as follows (assuming sibling arrays not impacted by flattening are excluded in the output):

id
vehicle_INDEX
vehicle_maintenance_INDEX
vehicle_maintenance_dealerdetails_INDEX
vehicle_maintenance_servicestation2_INDEX
vehicle_maintenance_servicestation2_dealeraddress_INDEX
vehicle_maintenance_servicestation2_dealeraddress_city
vehicle_maintenance_servicestation2_dealeraddress_state
vehicle_maintenance_servicestation2_dealeraddress_contact

If you do not select the projection preference Retain parent name lineage in attribute names, the attributes are as follows:

id
vehicle_INDEX
maintenance_INDEX
dealerdetails_INDEX
servicestation2_INDEX
dealeraddress_INDEX
city
state
contact
Adding and configuring a flatten operator

The following procedure assumes you have added a source operator and configured the operator to an Object Storage source with complex data types, such as a JSON file.

  1. From the Operators panel, drag the Flatten operator onto the canvas, and connect the operator to the source operator.

    The Details tab for the flatten operator displays in the Properties panel.

    The attributes table shows the tree data structure of the incoming data. By default, all nodes are expanded. Each node has a data type and a path such as VARCHAR and parent1.parent2.node1. For example: vehicle.insurance.company

    In the search field, you can search for top-level nodes only.

  2. With the flatten operator in focus, use the Details tab to do the following:
    1. Enter a name for the operator in the Identifier field, or leave the name as-is.
    2. Under Projection preferences, review the settings. By default, all settings are selected. Clear the checkbox of the settings that you do not want configured for this operation.
    3. In the attributes tables, select the checkbox next to the array node that you want to flatten the data structure to. Then select Flatten data to selected attribute.

      You can select one checkbox only.

      After flattening, the path of the complex attribute you selected and flattened is displayed above the table, next to Flatten by. For example: FLATTEN_1.MYFILE_JSON.vehicle.maintenance.dealerdetails.servicestation2.dealeraddress

    4. (Optional) Where applicable in the flattened structure, select the actions icon next to an array node in the table, and select Convert array into string. See also Array Conversions.
    5. (Optional) To revert the flattening you selected in step c, select Clear that is next to the Flatten by path. See Removing a Flattening Path.
  3. Under the Attributes tab, select to view incoming attributes or output attributes. Incoming attributes link to the operator on the left side. Output attributes link to the operator on the right side and go into the next operator.

    You can select individual attributes or use the filter icons to filter the list of attributes. Then apply bulk exclude rules to the selected or filtered attributes.

    • Click the filter icon in the Name column to filter attributes by a name pattern. In the filter field, enter a simple regular expression. You can use wildcards ? and * in the regex pattern.
    • Click the filter icon in the Type column to filter attributes by a data type. Use the menu to select the type to use as a filter.
      Note

      You can apply only one name pattern filter, but multiple type filters at a time. For example, if you apply one name pattern filter (*_CODE) and two type filters (numeric, varchar), the attributes are filtered by the name pattern *_CODE and the type numeric or varchar.
    • Use the Actions menu to apply bulk exclude rules to the selected attributes or filtered attributes. Select Exclude by selection or Exclude by applied filter.
    • Click View rules to open the Rules panel. You can view and manage the rules that have been applied to the data entity. By default, the first rule in the Rules panel includes everything.

    A complex type is shown as ARRAY (data type), COMPOSITE, or MAP (key type, value type). See Hierarchical Data Types to understand what is supported for complex types.

  4. In the Data tab, view a sampling of the data, based on the configuration of the operator and any rules you have applied in the Attributes tab.

    The Data Profile does not appear for complex data attributes. For complex data attributes, to view the hierarchy of the data structure in a simplified structure, click the complex data shown. For example, [...] or {…}.

    You can apply transformations to individual attributes or do a bulk transformation on a group of attributes.

    For entities with hierarchical data types, see Hierarchical Data Types to understand what is supported.

  5. In the Validation tab, check for warnings or errors that might cause the data flow to fail.
Converting other arrays into strings

After you have selected one complex attribute for flattening, there might be other Array nodes in the flattened structure.

With the other Array nodes in a flattened structure, you can convert an Array of primitive data types or an Array of Structs into a String. Currently, the supported format of a conversion String is JSON.

Depending on the data, the data type of a conversion String is VARCHAR or BLOB. You can change a conversion String of VARCHAR to CLOB, and conversely from CLOB to VARCHAR.

The following procedure assumes you have added a flatten operator and selected a complex attribute to create a flattened data structure.

  1. With the flatten operator in focus on the data flow canvas, go to the Details tab in the Properties panel.
  2. In the attributes table, verify that the data structure is flattened.

    Next to Flatten by is the path of the complex attribute you have selected for flattening. For example: FLATTEN_1.MYFILE_JSON.vehicle.maintenance.dealerdetails.servicestation2.dealeraddress

    The actions icon (three dots) is displayed for Array nodes that are available in the flattened structure for converting.

  3. Where applicable, select the actions icon next to an Array node and then select Convert array into string.

    Examples of converted attributes:

    Array type Converted attribute

    Array of strings

    "goods": [
        "laptop",
        "charger",
        "chair"
      ]

    ["laptop","charger","chair"]

    Array of numbers

    "numbers": [
        234,
        455,
        98
      ]

    [234,455,98] 
Viewing, changing, and removing array conversions

The following procedure assumes you have selected an Array node to be converted into a String, if applicable.

  1. With the flatten operator in focus on the data flow canvas, go to the Details tab in the Properties panel.
  2. Select View array conversions.
  3. In the Array conversions panel that displays, review the complex attributes that have been converted into a String.
  4. To change the data type of a conversion String, select an update action from the actions icon. Depending on the data type, select Update data type to CLOB or Update data type to VARCHAR.
  5. Do any of the following to remove conversions:
    • To remove a single conversion, select the checkbox next to a conversion, then select Remove. Alternatively, you can select Remove from the actions icon for a conversion.
    • To remove all conversions, select the checkbox at the top of the list (next to the header Array conversions), then select Remove.

    A selected conversion is removed and the previously converted attribute is reverted to its original hierarchy structure and data type.

Removing a flattening path

When configuring a flatten operator on a data flow, you can revert a flattening by clearing the Flatten by path.

  1. With the flatten operator in focus on the data flow canvas, go to the Details tab in the Properties panel.
  2. Above the attributes table, next to the Flatten by path, select Clear.
  3. In the Clear flattening dialog that appears, confirm the complex attribute name that you want to unflatten, then select Clear to proceed.

    By removing flattening, the original data structure before flattening is restored:

    • If created, any attributes with the suffix _INDEX are removed.
    • If parent names were removed from child node names, the parent-child name lineage hierarchy is restored.
    • If some arrays were converted into strings, the array conversions are removed and the previously converted attributes are reverted to their original hierarchy structure and data type.

Table Function Operator

Use the table function operator to add table functions that return data in the form of tables.

Currently, only built-in table functions can be selected from a list. The table functions that are provided by the table function operator include cube, cross tabulation analysis, frequent items, rollup, and Spark SQL. Each built-in table function has predefined fields that you configure to return the desired collection of data. Complex logic can be consolidated into a single function that returns specific sets of data. Parameterization of the fields allows for flexibility in adjusting the logic.

A table function operator can be connected with any other operator within a data flow. Thus, a table function operator can be used as a data source, a mid-stream operator, and a target.

Adding a table function operator
  1. From the Operators panel, drag and drop a Table function operator onto the canvas.
  2. With the operator in focus, on the Details tab of the Properties panel, enter a name in the Identifier field, or leave the name as-is.
  3. For Transformation type, select the table function that you want to use for this operator.
  4. Under Configure parameters, perform the following steps:
    1. Configure the table function's core logic by specifying the function details. Only specific function input keys and their values can be configured, as shown on the Details tab for the selected table function.
    2. (Optional) To parameterize an input key-value pair, click Assign parameter to use parameters. See Assigning a Parameter.
    3. After completing the input fields, click Calculate output shape.

      Data Integration generates the output fields using the core logic details, and materializes the input/output shapes. Use the Attributes tab to view the input/output shapes.

  5. Under the Attributes tab, select to view incoming attributes or output attributes. Incoming attributes link to the operator on the left side. Output attributes link to the operator on the right side and go into the next operator.

    You can select individual attributes or use the filter icons to filter the list of attributes. Then apply bulk exclude rules to the selected or filtered attributes.

    • Click the filter icon in the Name column to filter attributes by a name pattern. In the filter field, enter a simple regular expression. You can use wildcards ? and * in the regex pattern.
    • Click the filter icon in the Type column to filter attributes by a data type. Use the menu to select the type to use as a filter.
      Note

      You can apply only one name pattern filter, but multiple type filters at a time. For example, to filter by the name pattern *_CODE and by the type numeric or varchar, you apply one name pattern filter (*_CODE) and two type filters (numeric, varchar).
    • Use the Actions menu to apply bulk exclude rules to the selected attributes or filtered attributes. Select Exclude by selection or Exclude by applied filter.
    • Click View rules to open the Rules panel. You can view and manage the rules that have been applied to the data entity. By default, the first rule in the Rules panel includes everything.

    A complex type is shown as ARRAY (data type), COMPOSITE, or MAP (key type, value type). See Hierarchical Data Types to understand what is supported for complex types.

  6. In the Data tab, view a sampling of the data, based on the configuration of the operator and any rules you have applied in the Attributes tab.

    The Data Profile does not appear for complex data attributes. For complex data attributes, to view the hierarchy of the data structure in a simplified structure, click the complex data shown. For example, [...] or {…}.

    You can apply transformations to individual attributes or do a bulk transformation on a group of attributes.

    For entities with hierarchical data types, see Hierarchical Data Types to understand what is supported.

  7. In the Validation tab, check for warnings or errors that might cause your data flow to fail.
Understanding table functions

Here's a list of the built-in table functions that are supported by the table function operator.

Cache

Persists and caches a dataset according to the given memory storage level.

Returns a new dataset in the cached resilient distributed dataset (RDD).

Parameter Description
level

The memory storage to use:

MEMORY_ONLY: Store as deserialized Java objects in the JVM. If memory is insufficient, some RDD partitions are not cached, but are recomputed as and when needed.

MEMORY_AND_DISK: Store as deserialized Java objects in the JVM. If memory is insufficient, some RDD partitions are stored on disk, and read from disk as and when needed.

MEMORY_ONLY_SER: Store as deserialized Java objects.

MEMORY_AND_DISK_SER: Store as deserialized Java objects. Partitions that don't fit are stored on disk.

DISK_ONLY: Store the RDD partitions only on disk.

MEMORY_ONLY_2, MEMORY_AND_DISK_2, etc: Similar to other levels but each partition is replicated on two cluster nodes.

Cross tabulation analysis

Computes a pair-wise frequency table or contingency table from distinct values of the given two columns.

The number of distinct values for each column must be less than 1e4.

Returns a dataframe that contains the contingency table. In the contingency table:

  • The name of the first column is col1_col2, where col1 is the first column and col2 is the second column before transformation. The first column of each row are the distinct values of col1.
  • The other column names are the distinct values of col2.
  • The counts are returned as the long type.
  • Pairs without occurrences are given zero as the count.
  • The maximum number of non-zero pairs is 1e6.
  • Null elements are replaced by null.
Parameter Description
col1

Name of the first column. See the note after this table for limitations.

Distinct items from col1 make the first item of each row in the returned dataframe.

col2

Name of the second column. See the note after this table for limitations.

Distinct items from co2 make the column names in the returned dataframe.

Note

Only the following characters are allowed in a column name:

  • lower and upper case letters a-z A-Z
  • numbers 0-9
  • underscore _
Example

Suppose col1 is age and col2 is salary.

Before transformation:

+---+------+
|age|salary|
+---+------+
|1  |100   |
|1  |200   |
|2  |100   |
|2  |100   |
|2  |300   |
|3  |200   |
|3  |300   |
+---+------+

After transformation:

+----------+---+---+---+
|age_salary|100|200|300|
+----------+---+---+---+
|2         |2  |0  |1  |
|1         |1  |1  |0  |
|3         |0  |1  |1  |
+----------+---+---+---+
Cube

Generates a multi-dimensional cube of all possible combinations using the given columns list and the calculations in the given aggregate function expressions.

The following aggregate functions are supported in the expressions:

  • AVG
  • COUNT
  • MEAN
  • MAX
  • MIN
  • SUM

Returns a dataframe after the cube transformation.

Parameter Description
cubeColumns

The list of columns, separated by commas, by which to generate the multi-dimensional cube.

aggExpressions

The aggregate function expressions to perform on the columns.

For example: salary -> avg, age -> max

Example

Before transformation:

+----------+-----+-------+------+---+
|department|group|region |salary|age|
+----------+-----+-------+------+---+
|Eng       |A    |local  |1000  |10 |
|Art       |B    |foreign|2000  |11 |
|Sport     |C    |local  |1500  |13 |
|Eng       |A    |foreign|800   |19 |
|Art       |A    |foreign|3000  |30 |
|Eng       |E    |foreign|1000  |21 |
|Art       |D    |local  |1000  |32 |
|Sport     |C    |local  |1000  |28 |
+----------+-----+-------+------+---+

If cubeColumns is department, region and aggExpressions issalary -> avg, age -> max (compute the average salary and maximum age), the after transformation cube is:

+----------+-------+-----------------+--------+
|department|region |avg(salary)      |max(age)|
+----------+-------+-----------------+--------+
|Eng       |foreign|900.0            |21      |
|Sport     |null   |1250.0           |28      |
|Art       |null   |2000.0           |32      |
|Eng       |null   |933.3333333333334|21      |
|null      |local  |1125.0           |32      |
|Art       |foreign|2500.0           |30      |
|null      |foreign|1700.0           |30      |
|Eng       |local  |1000.0           |10      |
|null      |null   |1412.5           |32      |
|Sport     |local  |1250.0           |28      |
|Art       |local  |1000.0           |32      |
+----------+-------+-----------------+--------+
Data generation

Generates synthetic data using the given file format and content. The supported file formats are:

  • CSV
  • JSON

Returns a data entity. The synthetic data source can be used in pipeline unit-tests where real data is not used or moved.

Parameter Description
format

The type of file. Supported values are:

  • CSV
  • JSON
content

The file content for the given format.

Example

A CSV content example:

co1,co2,co3
1,B,3
A,2,C

Data generation result:

+---+---+---+
|co1|co2|co3|
+---+---+---+
|1  |B  |3  |
+---+---+---+
|A  |2  |C  |
+---+---+---+
Deduplication

Finds duplicates in the given column or columns list and returns a new dataset with the duplicate rows removed.

Parameter Description
columns

A column name or a comma-separated list of column names.

Example

Before transformation:

+---+---+
|a  |b  |
+---+---+
|1  |2  |
|3  |4  |
|1  |3  |
|2  |4  |
+---+---+

If columns is a, the dataset after transformation is:

+---+---+
|a  |b  |
+---+---+
|1  |2  |
|3  |4  |
|2  |4  |
+---+---+
Frequent items

Finds frequent items in the given column or columns list using the given minimum frequency.

The frequent element count algorithm proposed by Karl et al. is used to find frequent items. False positives are possible.

Returns a dataframe with an array of frequent items for each column.

Parameter Description
freqItems

A column name or a comma-separated list of column names for which you want to find frequent items.

support

Minimum frequency for an item to be considered frequent.

The value must be greater than 1e-4 (decimal value of 0.0001).

For example, 0.6 represents 60%. That is, an item that appears in at least 60% of the rows is a frequent item.

Example

Before transformation:

+-------+--------------+
|user_id|favorite_fruit|
+-------+--------------+
|      1|         apple|
|      2|        banana|
|      3|         apple|
|      4|        orange|
|      5|        banana|
|      6|        banana|
|      7|         apple|
+-------+--------------+

If freqItems is favorite_fruit and support is 0.6, the dataframe returned after transformation is:

+------------------------+
|favorite_fruit_freqItems|
+------------------------+
|                [banana]|
+------------------------+
Na Drop

Removes rows that contain null or NaN values in the given column or columns list.

Returns a dataframe that excludes the rows.

Parameter Description
how

How to determine if a row is to be dropped. Supported values are:

  • any: Drop the row that contains any null or NaN values in the specified cols.
  • all: Drop the row only if every specified cols is null or NaN for that row.

The how parameter is ignored if the minNonNulls parameter is specified.

cols

A column name or a comma-separated list of column names.

minNonNulls

The minimum number of non-null and non-NaN values that a row can contain.

Drops the rows that contain less than the specified minimum.

The minNonNulls parameter overrides the how parameter.

Example

Before transformation:

+----+-------+----+
|id  |name   |age |
+----+-------+----+
|1   |Alice  |25  |
|2   |null   |28  |
|null|Bob    |null|
|4   |Charlie|30  |
+----+-------+----+

If how is any and cols is name, the dataframe returned after transformation is:

+----+-------+----+
|id  |name   |age |
+----+-------+----+
|1   |Alice  |25  |
|null|Bob    |null|
|4   |Charlie|30  |
+----+-------+----+

When you use the how parameter with the value all, a row is dropped only if all its values are null. For example, before transformation:

+----+--------+----+
|  id|    name| age|
+----+--------+----+
|   1|John Doe|  30|
|   2|    null|null|
|   3|Jane Doe|  25|
|   4|Jake Doe|null|
|null|    null|null|
+----+--------+----+

After transformation:

+---+--------+----+
| id|    name| age|
+---+--------+----+
|  1|John Doe|  30|
|  2|    null|null|
|  3|Jane Doe|  25|
|  4|Jake Doe|null|
+---+--------+----+
Na Fill

Replaces null values and returns a dataframe using the replaced values.

Parameter Description
replacement

The key-value map to use for replacing null values. The key is a column name. The value is a replacement value.

The key points to the replacement value. For example: name -> Tom means replace null values in the name column with Tom.

A replacement value is cast to the column's data type. A replacement value must be of the following type:

  • Int
  • Long
  • Float
  • Double
  • String
  • Boolean
Example

Before transformation:

+----+-------+----+
|id  |name   |age |
+----+-------+----+
|1   |Alice  |25  |
|2   |null   |28  |
|null|Bob    |null|
|4   |Charlie|30  |
+----+-------+----+

If the key-value map is as follows:

id -> 3

name -> unknown

age -> 10

After transformation:

+---+-------+---+
|id |name   |age|
+---+-------+---+
|1  |Alice  |25 |
|2  |unknown|28 |
|3  |Bob    |10 |
|4  |Charlie|30 |
+---+-------+---+
Na Replace

Replaces a value with another value in the given column or columns by matching keys in the given key and replacement value map.

Returns a new dataframe that contains the replaced transformation.

Parameter Description
cols

A column name or a comma-separated list of column names on which to apply replacement values.

If cols is *, then replacement is applied on all String, Numeric, or Boolean columns.

replacement

The replacement key-value map to use. The key is a value to replace. The value is the replacement value. The map value can have nulls.

The key points to the replacement value. For example: Alice -> Tom means replace Alice with Tom.

The key and replacement value pair must have the same type. Only the following types are supported:

  • Boolean
  • Double
  • String
Example

Before transformation:

+----+-------+----+
|id  |name   |age |
+----+-------+----+
|1   |Alice  |25  |
|2   |null   |28  |
|null|Bob    |null|
|4   |Charlie|30  |
+----+-------+----+

If cols is name, and replacement is Alice -> Tom, the dataframe after transformation is:

+----+-------+----+
|id  |name   |age |
+----+-------+----+
|1   |Tom    |25  |
|2   |null   |28  |
|null|Bob    |null|
|4   |Charlie|30  |
+----+-------+----+
Repartition

Divides a dataset by the given columns into the specified number of partitions.

Returns a new dataset that's hash partitioned. The exact number of partition as specified is returned.

Parameter Description
partitionColumns

A column name or comma-separated list of column names by which the dataset is partitioned.

numberOfPartitions

The number of partitions to create.

Rollup

Generates a multi-dimensional rollup of possible combinations using the given columns list and the calculations in the given aggregate function expressions.

The following aggregate functions are supported in the expressions:

  • AVG
  • COUNT
  • MEAN
  • MAX
  • MIN
  • SUM

Returns a dataframe after the rollup transformation.

Parameter Description
rollupColumns

The list of columns, separated by commas, by which to generate the multi-dimensional rollup.

aggExpressions

The aggregate function expressions to perform on the columns.

For example: salary -> avg, age -> max

Example

Before transformation:

+----------+-----+-------+------+---+
|department|group|region |salary|age|
+----------+-----+-------+------+---+
|Eng       |A    |local  |1000  |10 |
|Art       |B    |foreign|2000  |11 |
|Sport     |C    |local  |1500  |13 |
|Eng       |A    |foreign|800   |19 |
|Art       |A    |foreign|3000  |30 |
|Eng       |E    |foreign|1000  |21 |
|Art       |D    |local  |1000  |32 |
|Sport     |C    |local  |1000  |28 |
+----------+-----+-------+------+---+

If rollupColumns is department, region and aggExpressions is salary -> avg, age -> max (compute the average salary and maximum age), the after transformation rollup is:

+----------+-------+-----------------+--------+
|department|region |avg(salary)      |max(age)|
+----------+-------+-----------------+--------+
|Eng       |foreign|900.0            |21      |
|Sport     |null   |1250.0           |28      |
|Art       |null   |2000.0           |32      |
|Eng       |null   |933.3333333333334|21      |
|Art       |foreign|2500.0           |30      |
|Eng       |local  |1000.0           |10      |
|null      |null   |1412.5           |32      |
|Sport     |local  |1250.0           |28      |
|Art       |local  |1000.0           |32      |
+----------+-------+-----------------+--------+
Spark SQL

Runs Spark SQL queries on incoming data, by first creating temporary views using the given table names.

Parameter Description
SQL

The SQL statement or script to run.

Example: select * from table1

tableName

A table name or comma-separated list of table names by which Spark creates temporary tables.

Example: table1, table2

Stratified sampling

Generates a stratified sample without replacement based on the given sampling fraction for each stratum.

Returns a new dataframe that represents the stratified sample.

Parameter Description
strata_col

The column that defines the strata.

fractions

The sampling fraction for each stratum, from 0.0 to 1.0.

For example, 0.1 returns 10% of the rows, and 1.0 returns 100% of the rows.

If a fraction is not specified, zero is assumed.

sample_size

If fractions is not specified, specify a portion of the dataset to be sampled, from 0.0 to 1.0.

seed

Use any random number in seed to return the same sample every time, keeping the result consistent.

Example

Before transformation:

+---+-----+
|key|value|
+---+-----+
|  1|    1|
|  1|    2|
|  2|    1|
|  2|    1|
|  2|    3|
|  3|    2|
|  3|    3|
+---+-----+

If strata_col is key, and fractions is the following:

1 -> 1.0, 3 -> 0.5

After transformation:

+---+-----+
|key|value|
+---+-----+
|  1|    1|
|  1|    2|
|  3|    2|
+---+-----+
Summary

Computes the given statistics for numeric and string columns.

If no statistics are provided, all the following are computed:

  • COUNT
  • MEAN
  • STDDEV
  • MIN
  • approximate quartiles (percentiles at 25%, 50%, and 75%)
  • MAX
Parameter Description
statistics

A comma-separated list of statistics.

Supported values are:

  • COUNT
  • MEAN
  • STDDEV
  • MIN
  • one or more arbitrary approximate percentiles specified as a percentage (for example, 75%)
  • MAX

Example: count, mean, stddev, min, 27%, 41%, 95%, max

Example

Before transformation:

+----------+------+-----------------+--------+
|department|gender|avg(salary)      |max(age)|
+----------+------+-----------------+--------+
|Eng       |female|900.0            |21      |
|Sport     |null  |1250.0           |28      |
|Art       |null  |2000.0           |32      |
|Eng       |null  |933.3333333333334|21      |
|Art       |female|2500.0           |30      |
|Eng       |male  |1000.0           |10      |
|null      |null  |1412.5           |32      |
|Sport     |male  |1250.0           |28      |
|Art       |male  |1000.0           |32      |
+----------+------+-----------------+--------+

Suppose statistics is count, mean, stddev, min, 27%, 41%, 95%, max.

After transformation:

+-------+----------+-----+------+----------------+----------------+
|summary|department|group|gender|salary          |age             |
+-------+----------+-----+------+----------------+----------------+
|count  |8         |8    |8     |8               |8               |
|mean   |null      |null |null  |1412.5          |20.5            |
|stddev |null      |null |null  |749.166203188585|8.76682056718072|
|min    |Art       |A    |female|800             |10              |
|27%    |null      |null |null  |1000            |13              |
|41%    |null      |null |null  |1000            |19              |
|95%    |null      |null |null  |3000            |32              |
|max    |Sport     |E    |male  |3000            |32              |
+-------+----------+-----+------+----------------+----------------+