Using Apache Spark
Apache Spark is a data processing engine that performs processing tasks for big data workloads.
The Thrift JDBC/ODBC server corresponds to the HiveServer2 in built-in Hive. You can test the JDBC server with the beeline script that comes with either Spark or Hive. To connect to the Spark Thrift Server from any machine in a Big Data Service cluster, use the spark-beeline
command.
Spark Configuration Properties
Spark configuration properties included in Big Data Service 3.1.1 or later.
Configuration | Property | Description |
---|---|---|
spark3-env |
spark_history_secure_opts |
Spark History Server Java options if security is enabled |
spark_history_log_opts |
Spark History Server logging Java options | |
spark_thrift_log_opts |
Spark Thrift Server logging Java options | |
spark_library_path |
Paths containing shared libraries for Spark | |
spark_dist_classpath |
Paths containing Hadoop libraries for Spark | |
spark_thrift_remotejmx_opts |
Spark Thrift Server Java options if remote JMX is enabled | |
spark_history_remotejmx_opts |
Spark History Server Java options if remote JMX is enabled | |
livy2-env |
livy_server_opts |
Livy Server Java options |
Group Permission to Download Policies
You can grant users access to download Ranger policies using a user group that allows running SQL queries through a Spark job.
In a Big Data Service HA cluster with the Ranger-Spark plugin enabled, you must have access to download Ranger policies to run any SQL queries using a Spark jobs. To grant permission to download Ranger policies, the user must be included in the policy.download.auth.users
and tag.download.auth.users
lists. For more information, see Spark Job Might Fail With a 401 Error While Trying to Download the Ranger-Spark Policies.
Instead of specifying many users, you can configure the policy.download.auth.groups
parameter with a user group in the Spark-Ranger repository in the Ranger UI. This allows all users in that group to download Ranger policies and this feature is supported from ODH version 2.0.10 or later.
Example:
Spark-Ranger Plugin Extension
The Spark-Ranger plugin extension can't be overridden at runtime in ODH version 2.0.10 or later.
Fine-grained access control can't be fully enforced in non-Spark Thrift Server use cases through the Spark Ranger plugin. Ranger Admin is expected to grant required file access permissions to data in HDFS through HDFS ranger policies.