In this section we will configure Spark 1.3.1 on Hortonworks Sandbox with HDP 2.2.

Login as root to your Sandbox and add the repo that has Spark 1.3.1 using the following command

wget -nv -O /etc/yum.repos.d/HDP-TP.repo

With the following command install the Spark package

yum install spark_2_2_4_4_16-master

It will take a few minutes to complete the installation of Spark and all the dependencies:

Lets’s also install pyspark with the command

yum install spark-python

Use the hdp-select command to configure history server and client to point to the version we just installed:

hdp-select set spark-historyserver
hdp-select set spark-client

Let’s check if all is well by running a sample.

First let’s as the spark user that was created during the RPM install and then cd to the $SPARK_HOME directory.

su spark
cd /usr/hdp/current/spark-client

Now let’s run the sample to calculate properties

./bin/spark-submit --class org.apache.spark.examples.SparkPi --master yarn-client --num-executors 3 --driver-memory 512m --executor-memory 512m --executor-cores 1 lib/spark-examples*.jar 10

Saptak Sen

If you enjoyed this post, you should check out my book: Starting with Spark.

Share this post