Using Apache Spark

Apache Spark is a data processing engine that performs processing tasks for big data workloads.

The Thrift JDBC/ODBC server corresponds to the HiveServer2 in built-in Hive. You can test the JDBC server with the beeline script that comes with either Spark or Hive. To connect to the Spark Thrift Server from any machine in a Big Data Service cluster, use the spark-beeline command.

Spark Configuration Properties

Spark configuration properties included in Big Data Service 3.1.1 or later.


Configuration	Property	Description
`spark3-env`	`spark_history_secure_opts`	Spark History Server Java options if security is enabled
	`spark_history_log_opts`	Spark History Server logging Java options
	`spark_thrift_log_opts`	Spark Thrift Server logging Java options
	`spark_library_path`	Paths containing shared libraries for Spark
	`spark_dist_classpath`	Paths containing Hadoop libraries for Spark
	`spark_thrift_remotejmx_opts`	Spark Thrift Server Java options if remote JMX is enabled
	`spark_history_remotejmx_opts`	Spark History Server Java options if remote JMX is enabled
`livy2-env`	`livy_server_opts`	Livy Server Java options

Group Permission to Download Policies

You can grant users access to download Ranger policies using a user group that allows running SQL queries through a Spark job.

In a Big Data Service HA cluster with the Ranger-Spark plugin enabled, you must have access to download Ranger policies to run any SQL queries using a Spark jobs. To grant permission to download Ranger policies, the user must be included in the policy.download.auth.users and tag.download.auth.users lists. For more information, see Spark Job Might Fail With a 401 Error While Trying to Download the Ranger-Spark Policies.

Instead of specifying many users, you can configure the policy.download.auth.groups parameter with a user group in the Spark-Ranger repository in the Ranger UI. This allows all users in that group to download Ranger policies and this feature is supported from ODH version 2.0.10 or later.

Example:

Access the Ranger UI.
Select Edit on the Spark repository.
Navigate to the Add New Configurations section.
Add or update policy.download.auth.groups with the user group.

Example:

policy.download.auth.groups = spark,testgroup
Select Save

Spark-Ranger Plugin Extension

The Spark-Ranger plugin extension can't be overridden at runtime in ODH version 2.0.10 or later.

Note

Fine-grained access control can't be fully enforced in non-Spark Thrift Server use cases through the Spark Ranger plugin. Ranger Admin is expected to grant required file access permissions to data in HDFS through HDFS ranger policies.

Oracle Cloud Infrastructure Documentation

Using Apache Spark

Spark Configuration Properties

Group Permission to Download Policies

Spark-Ranger Plugin Extension