Installing Apache Spark 1.2.0 on HDP 2.2

After SSH'ing into your Sandbox, use su to login as root. Now let’s get the bits using the command


Then uncompress the archive with

tar xvfz spark-

Move the resulting folder to /opt as just spark

mv spark- /opt/spark

also for good measure create a symbolic link from /usr/hdp/current/spark-client into /opt/spark

ln -s /opt/spark /usr/hdp/current/spark-client

Create a bash script called in the /etc/profile.d with the following lines

export SPARK_HOME=/usr/hdp/current/spark-client
export YARN_CONF_DIR=/etc/hadoop/conf
export PATH=/usr/hdp/current/spark-client/bin:$PATH

Then run the script with

source /etc/profile.d/

Create a file SPARK_HOME/conf/spark-defaults.conf and add the following settings:

spark.driver.extraJavaOptions -Dhdp.version=–2041 -Dhdp.version=–2041

Now before we can test run a sample, use your browser to navigate to http://<hostname>:8080 for the Ambari interface. Use admin and admin as username and password respectively to login.

After you login ensure necessary services like Datanode, Nodemanager, ResourceManager, Namenode, HiveServer2, etc are running. If they are stopped, start them and wait for them to start before the next step.

Now we can test the validity of our installation by running the sample Pi calculator

cd /opt/spark
./bin/spark-submit --verbose --class org.apache.spark.examples.SparkPi --master yarn-cluster --num-executors 3 --driver-memory 512m --executor-memory 512m --executor-cores 1 lib/spark-examples*.jar 10

You can check the value of the result by navigating to the URL above. Make sure you replace with the actual DNS name of your Sandbox VM. Also ensure that you have opened the port 8088 either on Azure or your Virtualization software.