Selecting the Source for a Data Loader Task

Select the data asset, connection, and schema that has the source data for loading. Then select one data entity or multiple data entities in the schema, depending on the load type you have specified for the data loader task.

When selecting multiple data entities from a file storage source type (such as Object Storage), you can use a file pattern to add entities that match the pattern, and use the logical entity qualifier to group matching entities into one or more pattern groups. Each pattern group is treated as a logical entity during runtime.

Selecting a Source Data Asset, Connection, and Schema

To configure the source data for a data loader task, you begin by selecting a data asset, connection, and schema.

You can parameterize the resources after you make the selections.

On the Create data loader task page, go to the Source step.
To create and select the data asset for this source, click Create next to Data asset to create a data asset.
To select a source data entity from an existing data asset, first use the resource menus to select the data asset, connection, and schema (or bucket) only in the order as displayed. Note the following:
- The Data asset resources available for selection depend on the type of source you selected on the Basic information step. Click View all next to a resource if you don't find what you want from the resource menu.
- If applicable for a data asset (for example, Object Storage), select a compartment, and then select the bucket that contains data entities.
- For BICC Oracle Fusion Applications, select the BICC offering as the schema. The BICC offering that you select provides the BICC view objects (VO) for data extraction.
- For a database connection, you can click Add to create and add the connection.
(Optional) You can assign a parameter to the data asset, connection, and schema after you have made selections for the resources. Do one of the following for a resource on the source:
- Click Parameterize next to the resource to assign a parameter to that resource. Upon parameterizing, Data Integration adds a parameter of the appropriate type and sets the default parameter value to the value that's currently configured for that resource.
- If available, click Reuse target <resource> parameter to use the same parameter that has been added to the resource of the same type on the target. For more information, see Reusing Parameters for Source or Target Resources.
Note that when the data asset type is Object Storage, you can parameterize the bucket (schema) but not the compartment that contains the bucket.

Depending on the data asset type of the source, you might have other options to select.

Data asset type Option

Data asset type	Option
Object Storage, S3, HDFS	Under File settings, select the File type, Compression type, and Encoding, if applicable. The compression types that are available for selection depends on the selected file type. For CSV, the other settings you can configure are: If all the files have the first row as a header row, select Data has header. If the values in the data rows span multiple lines, select Multi-line. Specify the Escape character that escapes other characters found in data values. For example: `\` Select the Delimiter character that separates data fields. For example: COLON (`:`), COMMA (`,`), PIPE (`\|`), SEMICOLON (`;`), or TAB (`\t`) Specify the Quote character that treats other characters as literal characters. For example: `"` If a column delimiter is included at the end of a data row in all files, select Trailing delimiter. For EXCEL, the settings you can configure are: By default, Data Integration treats the first row in a file as a header row. If the first row in a file is not a header row, select No for Has header. For Select entity by, you can choose the criteria as Sheet name, Sheet index, or Table name. Then enter a Value for the worksheet name, worksheet index, or table name. Sheet index is zero-based. For Sheet name or Sheet index, enter the area of the file to use as the Data range for selection. If you don't enter a data range value, the default is the data range A1, which corresponds to the entire sheet. If the file has a header row, enter a value that starts from the header row, for example, `A1:K56`. For JSON files: Select Use custom schema to paste or upload a custom sample schema that's used to infer the entity shape. When this checkbox is selected, schema drift is no longer applicable in the source entity. If you select Upload, drop a custom schema file in the box provided, or click Select a file to select the schema file to upload. If you select Paste in schema, copy the schema text file content and paste it in the box provided.
Oracle Fusion Applications using Business Intelligence Cloud Connector (BICC)	Choose how the source data is extracted. Select the Full extract strategy to extract and load all data, or extract and load all data from the specified Initial extract date. Select the Incremental extract strategy to extract and load only data that's created or modified after a specific date. Select Managed to use the last date that's tracked and managed by Data Integration. You don't explicitly specify a date. Important The Managed option fetches an extra 24 hours of data into the past. Data Integration adds the parameter `SYS.LAST_LOAD_DATE` to track the date of the last load or task run, and stores the last successful load date as a watermark across consecutive runs. When the task is run, you're allowed to change the value of the last date that's stored in the system. Select Custom to provide the date yourself in the Last extract date field. Specify a different BICC external configuration storage to use for the extracted data. Otherwise, the default storage that's configured within the BICC Oracle Fusion Applications data asset is used. Under Settings: Select Enable BI broker if you want the BICC extractor to read input rows directly from the Fusion Applications database instead of a regular extraction path. Important BI broker mode is a performance optimization option for advanced users. BI broker mode is a BICC feature supported only for some data stores. Extraction fails for data stores that aren't supported. With the BI broker option, extract performance might improve because queries are executed directly against the data stores in the underlying database. The performance improvement depends on the data store, current workload on the BI server, current workload on database, and other factors. In some situations, enabling the BI broker mode changes the source column data types. If you don't select Enable BI broker, the data is extracted through the BI server. For more information, see Broker Extract Mode in Configure Offerings to Extract of the Creating a Business Intelligence Cloud Extract guide. The BICC guide is part of the Applications Common book list in the Oracle Fusion Cloud Applications Suite documentation. Select the type of BICC column properties from the BICC source to include as attributes in the target. All: All columns of the BICC PVO are included. However, columns with names that start with ExtnAttribute or columns that have the same name and label don't appear. Default only: Only the columns enabled for extract by default in the BICC PVO are included. It displays all columns including the Flex columns. Default and primary keys: Default and primary columns of the BICC PVO are included. It displays all columns including the Flex columns. Primary keys only: Only the primary columns of the BICC PVO are included. It displays all columns including the Flex columns.

Object Storage, S3, HDFS

Under File settings, select the File type, Compression type, and Encoding, if applicable. The compression types that are available for selection depends on the selected file type.

For CSV, the other settings you can configure are:

If all the files have the first row as a header row, select Data has header.
If the values in the data rows span multiple lines, select Multi-line.
Specify the Escape character that escapes other characters found in data values. For example: \
Select the Delimiter character that separates data fields. For example: COLON (:), COMMA (,), PIPE (|), SEMICOLON (;), or TAB (\t)
Specify the Quote character that treats other characters as literal characters. For example: "
If a column delimiter is included at the end of a data row in all files, select Trailing delimiter.

For EXCEL, the settings you can configure are:

By default, Data Integration treats the first row in a file as a header row. If the first row in a file is not a header row, select No for Has header.
For Select entity by, you can choose the criteria as Sheet name, Sheet index, or Table name. Then enter a Value for the worksheet name, worksheet index, or table name. Sheet index is zero-based.
For Sheet name or Sheet index, enter the area of the file to use as the Data range for selection. If you don't enter a data range value, the default is the data range A1, which corresponds to the entire sheet. If the file has a header row, enter a value that starts from the header row, for example, A1:K56.

For JSON files:

Select Use custom schema to paste or upload a custom sample schema that's used to infer the entity shape. When this checkbox is selected, schema drift is no longer applicable in the source entity.
If you select Upload, drop a custom schema file in the box provided, or click Select a file to select the schema file to upload.
If you select Paste in schema, copy the schema text file content and paste it in the box provided.

Oracle Fusion Applications using Business Intelligence Cloud Connector (BICC)

Choose how the source data is extracted.

Select the Full extract strategy to extract and load all data, or extract and load all data from the specified Initial extract date.

Select the Incremental extract strategy to extract and load only data that's created or modified after a specific date.

Select Managed to use the last date that's tracked and managed by Data Integration. You don't explicitly specify a date.
Important

The Managed option fetches an extra 24 hours of data into the past.

Data Integration adds the parameter SYS.LAST_LOAD_DATE to track the date of the last load or task run, and stores the last successful load date as a watermark across consecutive runs. When the task is run, you're allowed to change the value of the last date that's stored in the system.
Select Custom to provide the date yourself in the Last extract date field.

Specify a different BICC external configuration storage to use for the extracted data. Otherwise, the default storage that's configured within the BICC Oracle Fusion Applications data asset is used.

Under Settings:

Select Enable BI broker if you want the BICC extractor to read input rows directly from the Fusion Applications database instead of a regular extraction path.

Important

BI broker mode is a performance optimization option for advanced users.
BI broker mode is a BICC feature supported only for some data stores. Extraction fails for data stores that aren't supported.
With the BI broker option, extract performance might improve because queries are executed directly against the data stores in the underlying database. The performance improvement depends on the data store, current workload on the BI server, current workload on database, and other factors.
In some situations, enabling the BI broker mode changes the source column data types.

If you don't select Enable BI broker, the data is extracted through the BI server. For more information, see Broker Extract Mode in Configure Offerings to Extract of the Creating a Business Intelligence Cloud Extract guide. The BICC guide is part of the Applications Common book list in the Oracle Fusion Cloud Applications Suite documentation.

Select the type of BICC column properties from the BICC source to include as attributes in the target.

All: All columns of the BICC PVO are included. However, columns with names that start with ExtnAttribute or columns that have the same name and label don't appear.
Default only: Only the columns enabled for extract by default in the BICC PVO are included. It displays all columns including the Flex columns.
Default and primary keys: Default and primary columns of the BICC PVO are included. It displays all columns including the Flex columns.
Primary keys only: Only the primary columns of the BICC PVO are included. It displays all columns including the Flex columns.

After selecting the data asset, connection, schema (or bucket), and applicable source options, the data entities that are available for selection are displayed in the Available data entities table on the Data entities tab.

Selecting a Single Source Data Entity

From the data entities table, select the data entity you want to use as the source. Data from the selected data entity loads to the target when the task is run.

You can parameterize the source data entity after you select a data entity from the list of available entities.

Go to the Source step, Data entities tab.
Do one of the following options to select a data entity:
- In the Available data entities table, select one data entity by clicking the checkbox that's next to the entity name. Then click Set as source. The name of the data entity you selected is displayed next to Selected data entity.
  
  To filter the list of available entities, enter a name or a pattern in the field and press Enter. You can enter a partial name or a pattern using special characters such as *. For example, enter ADDRESS_* to find ADDRESS_EU1, ADDRESS_EU2, and ADDRESS_EU3.
- If applicable, click Enter custom SQL and click Add SQL.
  
  In the editor panel that appears, enter a single SQL statement that defines the data to use as the source and click Validate.
  
  Note: In SQL statements, use single quotes around parameters. For example: select * from ORDERS where Created_On >= '${SYS.LAST_LOAD_DATE}'
  
  If validation is successful, click Add.
  
  The label SQL_ENTITY<nnnnnnnnn> appears, for example, SQL_ENTITY123456789. To see or edit the statement, click Edit.
(Optional) You can assign a parameter to the source data entity after you have made a selection.
- Click Parameterize next to the resource to assign a parameter to that resource. Upon parameterizing, Data Integration adds a parameter of the appropriate type and sets the default parameter value to the value that's currently configured for that resource.
- If available, click Reuse target data entity parameter to use the target entity parameter as the parameter for this source data entity. For more information, see Reusing Parameters for Source or Target Resources.
To further configure the data source and loader task, click the Settings tab, if applicable. Depending on the source type, the settings that you can configure are:
- Allow pushdown or turn off pushdown: By default, some data processing is offloaded to the source system. To apply processing or transformations outside the source system, clear the checkbox.
- Allow schema drift or lock the schema definitions: By default, schema definition changes in the specified data entity are automatically detected and picked up (design time and runtime). To use a fixed shape of the specified data entity even when the underlying shape has changed, clear the checkbox.
  For a JSON file, schema drift is disabled by default and can't be enabled if a custom schema is used to infer the entity shape. If you want schema drift to be available and enabled, edit the JSON source in the data flow or data loader task and clear the Use custom schema checkbox.
- Fetch file metadata as attributes: By default, the file name, file size, and other file metadata are included as attributes in the source data.
  Special characters (including space) in the file names are not supported.
  Clear the checkbox if you don't want to use file metadata as attributes.
- Incremental load: Select the checkbox to identify and load only the data that's created or modified since the last time the load process was run.
  
  (Relational database source only) For Watermark column, select the column that's used to mark the rows that have been incrementally loaded. Only DATE, TIMESTAMP, and DATETIME columns can be used as a watermark column.

Selecting Multiple Source Data Entities

From the list of available data entities, select the data entities you want to use as the source. You can select the data entities individually, select all available entities, or use a file pattern to select entities as a group. Data from the selected source data entities loads to the mapped targets when the task is run.

Data Integration creates rules for the data entities that you include as the source. Rules are added when you make individual data entity selections or when you use a file pattern (with or without a group name). Grouped data entities are treated as a logical entity during runtime.

Go to the Source step, Data entities tab.

The Data entities tab has two tables. One table shows the data entities that are available for selection. The other table shows the data entities that are selected for loading to the target.

In the Available data entities table, select the data entities you want to include in the source for this data loader task.

Depending on the source type, there might be more than one way to select the data entities you want. Use the appropriate step or steps as needed.

Method	Step
Select data entities manually	Step 3
Select all data entities	Step 4
Use a file pattern without a group name	Step 5
Use a file pattern with a group name	Step 6 (for file storage source types only, for example Object Storage)
Use a file pattern with a logical entity identifier and prefix group names	Step 7 (for file storage source types only, for example Object Storage)
Revert selections	The Selected source data entities table shows the data entities you have selected to be included in the source. To revert selections, see Removing Entities from the Selected Source Data Entities List.

To select multiple data entities manually:
- Click the add icon (plus sign) for each data entity that you want to include as the source.
- Select the checkbox for each data entity, and then click Add as source to include the selections as the source.
To select all available data entities:
- To select all the available data entities that are shown on the current page:
  
  Select the topmost checkbox (next to Name). Then click Add to source.
- To select all the available data entities across all pages in the table:
  
  Select the topmost checkbox (next to Name). Then click Select all available data entities, and click Add to source.
To use a file pattern without providing a group name:
1. In the Filter by name and pattern field, enter a file pattern and press Enter.
2. Click Add to source.
3. Click Add.
(File storage source types) To use a file pattern with a group name:
1. In the Available data entities table, Filter by name and pattern field, enter a file pattern (for example, MYSRC_BANK_C*.csv) and press Enter.
2. Click Add to source.
3. Select Add as a group using a file pattern.
  
  Select this option to group files into a pattern group. The files in a group must have the same number of columns and the column names must also match.
4. In the Group name field, enter a name for this pattern group.
5. Click Preview groups.
  Wait for a clickable group name to return in the preview table. This might take some time if there are many files that match.
  1. To verify that you have the files you want in the pattern group, click the group name.
  2. In the View pattern group details panel, verify the details and click Close.
6. In the Add to source panel, click Add.
  
  The data entities that match the file pattern are included as one group in the Selected source data entities table. Future incoming data entities that match the pattern are also included in the group.
(File storage source types) To use a file pattern with a logical entity identifier and a prefix for the group names:
1. In the Available data entities table, Filter by name and pattern field, enter a file pattern using a logical entity identifier (for example, MYSRC_BANK_{logicalentity:B|D}*.csv) and press Enter.
2. In the Add to source panel, select Add as a group using a file pattern.
  
  Select this option to group files into a pattern group. The files in a group must have the same number of columns and the column names must also match.
3. (Optional) In the Prefix for pattern group name field, enter a name to use as the prefix for the logical groups. For example: MYSRC_
4. Click Preview groups.
  Wait for clickable group names to return in the preview table. This might take some time if there are many files that match the pattern groups.
  1. To verify that you have the files you want in each pattern group, click a group name. For example: MYSRC_B
  2. In the View pattern group details panel, verify the details and click Close.
5. In the Add to source panel, click Add.
  
  The data entities that match the file pattern and logical entity identifier are included as groups in the Selected source data entities table. Future incoming data entities that match the pattern groups are also included in the groups. If the same entity matches multiple groups, it's included in all those groups.
To further configure the data sources and data loader task, select the Settings tab, if applicable. Depending on the source type, the settings that you can configure are:
- Allow pushdown or turn off pushdown: By default, some data processing is offloaded to the source system. To apply processing or transformations outside the source system, clear the checkbox.
- Fetch file metadata as attributes: By default, the file name, file size, and other file metadata are included as attributes in the source data.
  Special characters (including space) in the file names are not supported.
  Clear the checkbox if you don't want to use file metadata as attributes.
- Incremental load: Select the checkbox to identify and load only the data that's created or modified since the last time the load process was run.
  
  (Relational database source only) For Watermark column, select the column that's used to mark the rows that have been incrementally loaded. Only DATE, TIMESTAMP, and DATETIME columns can be used as a watermark column.
  
  All the entities in the Selected source data entities table must have the column with the same name that's used as the watermark column.

Removing Entities from the Selected Source Data Entities List

When you remove data entities from the Selected source data entities list, those data entities are no longer included in the source for the data loader task.

Go to the Source step, Data entities tab.
You can remove data entities from the Selected source data entities list in the following ways:
- Click the remove icon (negative sign) for the data entities that you want to remove from the selected list.
  
  Data Integration created a rule when you selected multiple data entities by using a pattern without a group. If you remove individual entities that were added to the Selected source data entities list by a pattern rule, exclude rules are created.
- Select the checkboxes for the data entities (that are not groups), and then click Remove to remove those entities from the source.
- Filter the selected list by entering a name or a pattern in the field and pressing Enter. You can enter a partial name or a pattern using special characters such as *.
  
  The matching entities might span multiple pages in the table.
  - Exclude all the selected data entities that are shown on the current page:
    
    Select the topmost checkbox (next to Name). Then click Remove.
  - Exclude all the selected data entities across all pages in the table:
    
    Select the topmost checkbox (next to Name). Click Select all included data entities, then click Remove.
- Exclude a group of data entities by group name.
  - Click a group name. In the panel, review the list of data entities that were added in that group.
  - Then click the remove icon (negative sign) for the group that you want to exclude.
  Note
  
  If you exclude a group that's created by a file pattern, other pattern groups that were created as a result of the file pattern are also excluded.
- Exclude different types of data entities (for example, by group and by selection).
  - Select the checkboxes for group names and data entity names, and then click Remove.
  - In the Remove entity panel, review the list of data entities that are impacted. For example, all pattern groups related to the group you select to remove are impacted. Then click Remove.
- Remove a rule to exclude those data entities impacted by that rule. See Viewing, Editing, and Removing Rules.
The data entities you removed are now available for selection again from the Available data entities list.

Using File Patterns and Groups

When selecting multiple data entities from a file storage source type (for example, Object Storage) to use as the source for a data loader task, you can use a file pattern to group and add existing files that match the pattern. Future incoming files that match the pattern are also included in the group.

In the file pattern, you can also use the logicalentity qualifier to group matching entities into one or more pattern groups. Each pattern group is treated as a logical entity during runtime.

Data entities that match multiple pattern groups are included in all those groups.

Consider the following filenames of data entities that are available for selection:

SRC_BANK_A_01.csv
SRC_BANK_B_01.csv
SRC_BANK_C_01.csv
SRC_BANK_C_02.csv
MYSRC_BANK_A_01.csv
MYSRC_BANK_B_01.csv
MYSRC_BANK_C_01.csv
MYSRC_BANK_C_02.csv
MYSRC_BANK_D_01.csv
MYSRC_BANK_D_02.csv

When you use the file pattern SRC*.csv, Data Integration creates a pattern rule and adds the following files to the source:

SRC_BANK_A_01.csv
SRC_BANK_B_01.csv
SRC_BANK_C_01.csv
SRC_BANK_C_02.csv

When you use the file pattern MYSRC_BANK_C*.csv and provide the group name MYSRC, Data Integration creates a group rule. At runtime, the group name consolidates all the files matching the pattern into one source entity named MYSRC. For example, the following files are consolidated:

MYSRC_BANK_C_01.csv
MYSRC_BANK_C_02.csv

Any future incoming files that match the pattern are added to the group. For example:

MYSRC_BANK_C_03.csv
MYSRC_BANK_C_04.csv

When you use the file pattern with the logicalentity qualifier, MYSRC_BANK_{logicalentity:B|D}*.csv, and you provide the group name prefix MYNEWSRC_, Data Integration creates a group rule, and adds two pattern groups that consolidate the following matching files:

For pattern group MYNEWSRC_B:
MYSRC_BANK_B_01.csv

For pattern group MYNEWSRC_D:
MYSRC_BANK_D_01.csv
MYSRC_BANK_D_02.csv

Viewing the List of Files Included in a Group

Data Integration creates groups in the Select source data entities list when you use a file pattern to select multiple files (for example, from Object Storage) as a group for inclusion in the source for a data loader task.

Go to the Source step, Data entities tab.
In the Selected source data entities list, click a group name.
In the View pattern group details panel, you can view the pattern used to create the group, and the list of data entities that match the pattern.

Viewing, Editing, and Removing Rules (Multiple Source Data Entities)

Data Integration adds rules when you select multiple data entities to be included in the source for a data loader task.

A rule is added when you made individual data entity selections or when applicable, you included the entities by a pattern or group. The number of rules is shown above the Selected source data entities table, in parentheses next to View rules. For example, View rules (3).

Before removing a group rule, ensure that you review the list of data entities impacted by the rule removal. See Viewing the List of Files Included in a Group.

Go to the Source step, Data entities tab.
To display the rules created for a data loader task, select View rules.
In the Rules panel, review the criteria that were created when you added or removed source data entities.
To remove a rule that's not a pattern group, you can use one of two ways:
- Select the checkbox for a rule, then click Remove.
- Select the actions menu for a rule, then click Remove.
Note

When you remove the rule, Data Integration removes from the Selected source data entities table those data entities impacted by the original rule (included by selection or by pattern). To add the entities back to the selected list, select and add them again from the Available data entities table.
To edit a pattern group rule:
1. Select Edit from the actions menu for the pattern group rule.
2. In the Edit rule panel, make the changes as needed, and click Save changes.
To remove a pattern group rule:
1. Select Remove from the actions menu for the pattern group rule.
2. In the Remove rule panel, review the impact of removing this named pattern group rule and click Remove.
Note

If the named pattern group rule is a logical entity file pattern group, the other pattern groups that were created as a result of the logical entity file pattern are also removed.

Oracle Cloud Infrastructure Documentation

Selecting the Source for a Data Loader Task