Creating an OCI Data Flow Task

An OCI Data Flow task in Data Integration is associated with an existing application that's created in OCI Data Flow.

Before you create an OCI Data Flow task, ensure that you have the required policies, setup, and information you need for working with OCI Data Flow tasks in Data Integration, as described in Required Policies and Setup.

Create an OCI Data Flow task in a project or folder. Data Integration includes one default project to get you started. To create another project or folder, see Projects and Folders.

In Data Integration, by default you can have simultaneous or parallel task runs of a task at a given time. To disallow concurrent task runs that are initiated manually, select the Disable simultaneous execution of the task checkbox when you create the task. When simultaneous task runs are disallowed, a run request for the task fails if there's already a task run in progress that's in a non-terminal state.

Note

Flexible shape usage considerations:

  • When you use a flexible shape (such as VM.Standard.E4.Flex) for the driver node, executor node, or both, customize the number of OCPUs and the amount of memory that you need.

  • A flexible shape provides a specific range of OCPUs values that you can use for that shape.

  • The number of OCPUs that you use for a shape determines the range of memory values that you can allocate.

  • If you parameterize the driver or executor shape, OCPUs and memory must be configured for the shape. The values of OCPUs and memory are used only when the shape parameter value is a flexible shape. The OCPUs and memory values are ignored if a non-flexible shape is configured in the parameter.

    1. Open the project or folder in which you want to create the task.

      For the steps to open the details page of a project or folder, see Viewing the Details of a Project or Viewing the Details of a Folder.

    2. On the project or folder details page, click Tasks.
    3. In the Tasks section, click Create task and select OCI Data Flow.
    4. On the Create OCI Data Flow task page, enter a name and an optional description.

      The identifier is a system-generated value based on the name. You can change the value, but after you create and save the task, you can't update the identifier.

    5. Select the Disable simultaneous execution of the task checkbox if you want to disallow concurrent runs of this task.
    6. (Optional) For Project or folder, click Select and select a different project or folder to save the task in.
    7. To save the task for the first time, click one of the following buttons:
      • Create: Creates and saves the task. You can continue to create and edit the task.

      • Create and close: Creates and saves the task, closes the page, and returns you to the tasks list on the project or folder details page.

    8. Save periodically while you work by clicking one of the following buttons:
      • Save: Commits changes since the last save. You can continue editing after saving.

      • Save and close: Commits changes, closes the page, and returns you to the tasks list on the project or folder details page.

      • Save as: Commits changes (since the last save) and saves to a copy instead of overwriting the current task. You can provide a name for the copy and select a different project or folder for the copy, or save the copy in the same project or folder as the existing task.

    9. In the OCI Data Flow Application section, click Select and select the OCI Data Flow application that this task runs by following these steps:
      1. On the Select an OCI Data Flow application page, select the compartment that contains the application that you want to associate with the task.
      2. In the Applications list, select the application.
      3. Click Select.

        You're returned to the Create OCI Data Flow task page.

    10. In the Configure properties section, click Configure to configure the properties for the selected application.

      The Configuration page appears.

      1. Specify the following property values directly or parameterize the properties (with default values). If you don't explicitly configure the applications properties in this step, the default values that are defined in the OCI Data Flow application are used.
        • Driver shape: Select the type of cluster node to use for the Spark driver host.

          If a flexible shape is selected, select the number of OCPUs and the amount of memory that can be allocated to the selected shape. The acceptable values for OCPUs depend on the selected shape. The acceptable values for memory depend on the selected OCPUs value.

        • Executor shape: Select the type of cluster node to use for each Spark executor host.

          If a flexible shape is selected, select the number of OCPUs and the amount of memory that can be allocated to the selected shape. The acceptable values for OCPUs depend on the selected shape. The acceptable values for memory depend on the selected OCPUs value.

        • Number of executors: Enter the number of Spark executor cluster nodes to launch when the OCI Data Flow application is run.

        • Arguments: Enter a comma-separated list of the arguments to pass to the main class of the Java, Python, or Scala application.

      2. To assign parameters to the property values:
        1. Click Assign parameter next to a property.

          If you parameterize the driver or executor shape, OCPUs and memory must be configured for the shape. The values of OCPUs and memory are used only when the shape parameter value is a flexible shape. The OCPUs and memory values are ignored if a non-flexible shape is configured in the parameter.

        2. On the Assign parameters page, perform one of the following actions:

          • Select a parameter from the list. Only parameters of the same property type appear in the list for selection.
          • Click Add parameter. In the Add parameter panel, enter a name (identifier) and an optional description. Then depending on the property type, either select the default value or enter the default value for the property, and click Add. The parameter that's added is automatically selected on the Assign parameters page.
        3. Click Assign.

          If you parameterize the OCPUs and memory values for a flexible shape, Data Integration displays an error message when you specify a value that's not in the acceptable range of values for that property. Edit the parameter and enter one of the acceptable values.

      3. (Optional) For Spark configuration properties, add a key pair for a property. Click Another property if you need to add more key pairs.

        The Spark configuration properties that you can add might depend on the Spark version of the selected OCI Data Flow application. See Supported Spark Properties.

      4. When you have finished configuring OCI Data Flow application properties and Spark properties, click Done.

        You're returned to the Create OCI Data Flow task page.

        In the Configure properties section, the number of parameters that you have assigned is shown in parentheses next to View parameters.

    11. (Optional) Click View parameters to review the assigned parameters, edit a default parameter value, or delete a parameter.

      On the View parameters page, edit a default value or delete a parameter by using the Actions menu (Actions menu) of the parameter. When you delete a parameter, the value that's assigned to the parameter becomes the default value of that property.

    12. (Optional) In the Validate task section, click Validate to check the property configurations.
    13. When you finish configuring the task, click Create and close or Save and Close.
    Publish the OCI Data Flow task to an application in Data Integration before you run the task or schedule the task for running. For publishing information, see Publishing to a Data Integration Application.
  • Use the oci data-integration task create-task-from-dataflow-task command and required parameters to create an OCI Data Flow task:

    oci data-integration task create-task-from-dataflow-task [OPTIONS]

    For a complete list of flags and variable options for CLI commands, see the Command Line Reference.

  • Run the CreateTask operation with the appropriate resource subtype to create an OCI Data Flow task.