PySpark 2.4 and 3.0 conda environments now support Resource Principal
- Services: Data Science
- Release Date: July 16, 2021
The Data Science service provides seamless integration with the Data Flow service. You can develop your Spark applications in PySpark using the Data Science service. Data Flow and PySpark can access data in Oracle Object Storage though the HDFS connector required instance principal (basically API keys) to make the connection. With these updated PySpark conda environments, you can now connect your PySpark applications to Object Storage using a resource principal.
What is a resource principal? In theory, they are similar to an instance principal in that they are both used for authentication. However, resources are not instances in that they are serverless. An example of a resource is an Object Storage bucket. While the goal of instance and resource principals are the same, the implementation is a little different. Instance principal requires developer-provided credentials. A resource principal is a set of rules that provide authentication without being tied to a developer's credentials.