Migrate Data Flow to Spark 3.5.0
Follow these steps to migrate Data Flow to using Spark 3.5.0.
To use Data Flow with Delta Lakes 3.1.0 and to integrate with Conda Pack, you must use at least version 3.5.0. of Spark with Data Flow.
Follow the instructions in the Spark 3.5.0 Migration Guide to upgrade to Spark 3.5.0.
Further to the supported versions information in Before you Begin with Data Flow, the following library versions are the
minimum supported by Data Flow with Spark 3.5.0 and with Spark
3.2.1.
Note
Build applications using the versions listed for Spark 3.2.1 before migrating to Spark 3.5.0.
Build applications using the versions listed for Spark 3.2.1 before migrating to Spark 3.5.0.
Library | Spark 3.5.0 | Spark 3.2.1 |
---|---|---|
Python | 3.11.5 | 3.8.13 |
Java | 17.0.10 | 11 |
Hadoop | 3.3.4 | 3.3.1 |
Scala | 2.12.18 | 2.12.15 |
oci-hdfs | 3.3.4.1.4.2 | 3.3.1.0.3.2 |
oci-java-sdk | 3.34.1 | 2.45.0 |
Note
By default, the OCI Java SDK uses the ApacheConnector. Switch to the Jersey HttpurlConnector with following settings:
By default, the OCI Java SDK uses the ApacheConnector. Switch to the Jersey HttpurlConnector with following settings:
spark.executorEnv.OCI_JAVASDK_JERSEY_CLIENT_DEFAULT_CONNECTOR_ENABLED=true
spark.driverEnv.OCI_JAVASDK_JERSEY_CLIENT_DEFAULT_CONNECTOR_ENABLED=true