Selecting the Source for a Data Loader Task
Select the data asset, connection, and schema that has the source data for loading. Then select one data entity or multiple data entities in the schema, depending on the load type you have specified for the data loader task.
When selecting multiple data entities from a file storage source type (such as Object Storage), you can use a file pattern to add entities that match the pattern, and use the logical entity qualifier to group matching entities into one or more pattern groups. Each pattern group is treated as a logical entity during runtime.
To configure the source data for a data loader task, you begin by selecting a data asset, connection, and schema.
You can parameterize the resources after you make the selections.
From the data entities table, select the data entity you want to use as the source. Data from the selected data entity loads to the target when the task is run.
You can parameterize the source data entity after you select a data entity from the list of available entities.
- Go to the Source step, Data entities tab.
-
Do one of the following options to select a data entity:
-
In the Available data entities table, select one data entity by clicking the checkbox that's next to the entity name. Then click Set as source. The name of the data entity you selected is displayed next to Selected data entity.
To filter the list of available entities, enter a name or a pattern in the field and press Enter. You can enter a partial name or a pattern using special characters such as *. For example, enter
ADDRESS_*
to find ADDRESS_EU1, ADDRESS_EU2, and ADDRESS_EU3. -
If applicable, click Enter custom SQL and click Add SQL.
In the editor panel that appears, enter a single SQL statement that defines the data to use as the source and click Validate. If validation is successful, click Add.
The label
SQL_ENTITY<nnnnnnnnn>
appears, for example,SQL_ENTITY123456789
. To see or edit the statement, click Edit.
-
- (Optional)
You can assign a parameter to the source data entity after you have made a selection.
- Click Parameterize next to the resource to assign a parameter to that resource. Upon parameterizing, Data Integration adds a parameter of the appropriate type and sets the default parameter value to the value that's currently configured for that resource.
- If available, click Reuse target data entity parameter to use the target entity parameter as the parameter for this source data entity. For more information, see Reusing Parameters for Source or Target Resources.
-
To further configure the data source and loader task, click the Settings tab, if applicable. Depending on the source type, the settings that you can configure are:
- Allow pushdown or turn off pushdown: By default, some data processing is offloaded to the source system. To apply processing or transformations outside the source system, clear the checkbox.
- Allow schema drift or lock the schema definitions: By default, schema definition changes in the specified data entity are automatically detected and picked up (design time and runtime). To use a fixed shape of the specified data entity even when the underlying shape has changed, clear the checkbox.
For a JSON file, schema drift is disabled by default and can't be enabled if a custom schema is used to infer the entity shape. If you want schema drift to be available and enabled, edit the JSON source in the data flow or data loader task and clear the Use custom schema checkbox.
- Fetch file metadata as attributes: By default, the file name, file size, and other file metadata are included as attributes in the source data. Clear the checkbox if you don't want to use file metadata as attributes.
-
Incremental load: Select the checkbox to identify and load only the data that's created or modified since the last time the load process was run.
(Relational database source only) For Watermark column, select the column that's used to mark the rows that have been incrementally loaded. Only
DATE
,TIMESTAMP
, andDATETIME
columns can be used as a watermark column.
From the list of available data entities, select the data entities you want to use as the source. You can select the data entities individually, select all available entities, or use a file pattern to select entities as a group. Data from the selected source data entities loads to the mapped targets when the task is run.
Data Integration creates rules for the data entities that you include as the source. Rules are added when you make individual data entity selections or when you use a file pattern (with or without a group name). Grouped data entities are treated as a logical entity during runtime.
When you remove data entities from the Selected source data entities list, those data entities are no longer included in the source for the data loader task.
When selecting multiple data entities from a file storage source type (for example, Object Storage) to use as the source for a data loader task, you can use a file pattern to group and add existing files that match the pattern. Future incoming files that match the pattern are also included in the group.
In the file pattern, you can also use the logicalentity
qualifier to
group matching entities into one or more pattern groups. Each pattern group is
treated as a logical entity during runtime.
Data entities that match multiple pattern groups are included in all those groups.
Consider the following filenames of data entities that are available for selection:
SRC_BANK_A_01.csv
SRC_BANK_B_01.csv
SRC_BANK_C_01.csv
SRC_BANK_C_02.csv
MYSRC_BANK_A_01.csv
MYSRC_BANK_B_01.csv
MYSRC_BANK_C_01.csv
MYSRC_BANK_C_02.csv
MYSRC_BANK_D_01.csv
MYSRC_BANK_D_02.csv
When you use the file pattern SRC*.csv
, Data Integration creates a pattern rule and adds the
following files to the source:
SRC_BANK_A_01.csv
SRC_BANK_B_01.csv
SRC_BANK_C_01.csv
SRC_BANK_C_02.csv
When you use the file pattern MYSRC_BANK_C*.csv
and provide the
group name MYSRC
, Data Integration
creates a group rule. At runtime, the group name consolidates all the files matching
the pattern into one source entity named MYSRC
. For example, the
following files are consolidated:
MYSRC_BANK_C_01.csv
MYSRC_BANK_C_02.csv
Any future incoming files that match the pattern are added to the group. For example:
MYSRC_BANK_C_03.csv
MYSRC_BANK_C_04.csv
When you use the file pattern with the logicalentity
qualifier,
MYSRC_BANK_{logicalentity:B|D}*.csv
, and you provide the group
name prefix MYNEWSRC_
, Data Integration creates a group rule, and adds two
pattern groups that consolidate the following matching files:
For pattern group MYNEWSRC_B:
MYSRC_BANK_B_01.csv
For pattern group MYNEWSRC_D:
MYSRC_BANK_D_01.csv
MYSRC_BANK_D_02.csv
Data Integration creates groups in the Select source data entities list when you use a file pattern to select multiple files (for example, from Object Storage) as a group for inclusion in the source for a data loader task.
- Go to the Source step, Data entities tab.
- In the Selected source data entities list, click a group name.
- In the View pattern group details panel, you can view the pattern used to create the group, and the list of data entities that match the pattern.
Data Integration adds rules when you select multiple data entities to be included in the source for a data loader task.
A rule is added when you made individual data entity selections or when applicable, you included the entities by a pattern or group. The number of rules is shown above the Selected source data entities table, in parentheses next to View rules. For example, View rules (3).
Before removing a group rule, ensure that you review the list of data entities impacted by the rule removal. See Viewing the List of Files Included in a Group.