Before you Begin with Data Flow

Before you begin using Data Flow, you must have:

An Oracle Cloud Infrastructure account. Trial accounts can be used to demo Data Flow.
A Service Administrator role for your Oracle Cloud services. When the service is activated, Oracle sends the credentials and URL to the chosen Account Administrator. The Account Administrator creates an account for each user who needs access to the service.
A supported browser, such as:
- Microsoft Internet Explorer 11.x+
- Mozilla Firefox ESR 38+
- Google Chrome 42+
A Spark Application uploaded to Object Storage. Do not provide it packaged in a zipped format such as .zip or .gzip.
Data for processing loaded into Oracle Cloud Infrastructure Object Storage. Data can be read from external data sources or clouds. Data Flow optimizes performance and security for data stored in an Oracle Cloud Infrastructure Object Store.
The supported application types are:
- Java
- Scala
- SparkSQL
- PySpark (Python 3 only)

This table shows the Spark versions supported by Data Flow.

Supported Spark Versions
Spark Version	Hadoop	Java	Python	Scala	oci-hdfs	oci-java-sdk	Spark Documentation
Spark 3.5.0	3.3.4	17.0.10	3.11.5	2.12.18	3.3.4.1.4.2	3.34.1	Spark Release 3.5.0 Guide
Spark 3.2.1	3.3.1	11.0.14	3.8.13	2.12.15	3.3.1.0.3.2	2.45.0	Spark Release 3.2.1 Guide
Spark 3.0.2	3.2.0	1.8.0_321	3.6.8	2.12.10	3.2.1.3	1.25.2	Spark Release 3.0.2 Guide
Spark 2.4.4	2.9.2	1.8.0_162	3.6.8	2.11.12	2.9.2.6	1.25.0	Spark Release 2.4.4 Guide

This table is for reference only, and isn't meant to be comprehensive.

Note

Avoid entering confidential information when assigning descriptions, tags, or friendly names to your cloud resources through the Oracle Cloud Infrastructure Console, API, or CLI. This applies when creating or editing an application in Data Flow.

Oracle Cloud Infrastructure Documentation

Before you Begin with Data Flow