Using Apache Livy

Apache Livy enables efficient submission of Spark jobs. In Big Data Service clusters with version 3.0.7 or later, Apache Livy is installed by default, and can be managed using Apache Ambari under the Spark3 service.

In Big Data Service clusters with version 3.0.7 or later, Apache Livy server runs on port 8998 in the first utility node un0 of the cluster. The Apache Livy logs are available from the /var/log/livy folder on the same node. The Apache Livy server configs can be managed from Apache Ambari.

In prior Big Data Service clusters, to use Apache Livy you must build Apache Livy with Spark3.

  1. Download Apache Livy source code to the local machine.
    https://github.com/apache/incubator-livy
  2. Build Apache Livy.
    mvn clean package -B -V -e -Pspark-3.0 -Pthriftserver -DskipTests -DskipITs -Dmaven.javadoc.skip=true

    Note: If the build fails in the python-api module, copy python-api pom from https://gist.github.com/gamberooni/30d86b92d09b014aa623f1b66e9183a0#file-python-api-pom-xml.

  3. After the build is successful, copy the Apache Livy zip file from assembly/target/ to the first utility node of your cluster.
  4. Edit the livy.conf file.
    vi livy-home/conf/livy.conf
    livy.repl.enable-hive-context = true