In this section we will configure Spark 1.3.1 on Hortonworks Sandbox with HDP 2.2.
Login as root to your Sandbox and add the repo that has Spark 1.3.1 using the following command
wget -nv http://public-repo-1.hortonworks.com/HDP/centos6/2.x/updates/2.2.4.4/hdp.repo -O /etc/yum.repos.d/HDP-TP.repo
With the following command install the Spark package
yum install spark_2_2_4_4_16-master
It will take a few minutes to complete the installation of Spark and all the dependencies:
Lets’s also install pyspark with the command
yum install spark-python
Use the hdp-select
command to configure history server and client to point to the version we just installed:
hdp-select set spark-historyserver 2.2.4.4-16
hdp-select set spark-client 2.2.4.4-16
Let’s check if all is well by running a sample.
First let’s as the spark
user that was created during the RPM install and then cd to the $SPARK_HOME directory.
su spark
cd /usr/hdp/current/spark-client
Now let’s run the sample to calculate properties
./bin/spark-submit --class org.apache.spark.examples.SparkPi --master yarn-client --num-executors 3 --driver-memory 512m --executor-memory 512m --executor-cores 1 lib/spark-examples*.jar 10
Saptak Sen
If you enjoyed this post, you should check out my book: Starting with Spark.