Using Apache Spark

Apache Spark is a data processing engine that performs processing tasks for big data workloads.

The Thrift JDBC/ODBC server corresponds to the HiveServer2 in built-in Hive. You can test the JDBC server with the beeline script that comes with either Spark or Hive. To connect to the Spark Thrift Server from any machine in a Big Data Service cluster, use the spark-beeline command.

Spark Configuration Properties

Spark configuration properties included in Big Data Service 3.1.1 or later.

Configuration Property Description
spark3-env spark_history_secure_opts Spark History Server Java options if security is enabled
spark_history_log_opts Spark History Server logging Java options
spark_thrift_log_opts Spark Thrift Server logging Java options
spark_library_path Paths containing shared libraries for Spark
spark_dist_classpath Paths containing Hadoop libraries for Spark
spark_thrift_remotejmx_opts Spark Thrift Server Java options if remote JMX is enabled
spark_history_remotejmx_opts Spark History Server Java options if remote JMX is enabled
livy2-env livy_server_opts Livy Server Java options

Group Permission to Download Policies

You can grant users access to download Ranger policies using a user group that allows running SQL queries through a Spark job.

In a Big Data Service HA cluster with the Ranger-Spark plugin enabled, you must have access to download Ranger policies to run any SQL queries using a Spark jobs. To grant permission to download Ranger policies, the user must be included in the policy.download.auth.users and tag.download.auth.users lists. For more information, see Spark Job Might Fail With a 401 Error While Trying to Download the Ranger-Spark Policies.

Instead of specifying many users, you can configure the policy.download.auth.groups parameter with a user group in the Spark-Ranger repository in the Ranger UI. This allows all users in that group to download Ranger policies and this feature is supported from ODH version 2.0.10 or later.

Example:

  1. Access the Ranger UI.
  2. Select Edit on the Spark repository.
  3. Navigate to the Add New Configurations section.
  4. Add or update policy.download.auth.groups with the user group.

    Example:

    policy.download.auth.groups = spark,testgroup

  5. Select Save

Spark-Ranger Plugin Extension

The Spark-Ranger plugin extension can't be overridden at runtime in ODH version 2.0.10 or later.

Note

Fine-grained access control can't be fully enforced in non-Spark Thrift Server use cases through the Spark Ranger plugin. Ranger Admin is expected to grant required file access permissions to data in HDFS through HDFS ranger policies.