Installing Apache Spark 1.2.0 on HDP 2.2
After SSH'ing into your Sandbox, use
su to login as root.
Now let’s get the bits using the command
Then uncompress the archive with
tar xvfz spark-18.104.22.168.2.0.0-82-bin-22.214.171.124.2.0.0-2041.tgz
Move the resulting folder to
/opt as just
mv spark-126.96.36.199.2.0.0-82-bin-188.8.131.52.2.0.0-2041 /opt/spark
also for good measure create a symbolic link from
ln -s /opt/spark /usr/hdp/current/spark-client
Create a bash script called spark.sh in the
/etc/profile.d with the following lines
export SPARK_HOME=/usr/hdp/current/spark-client export YARN_CONF_DIR=/etc/hadoop/conf export PATH=/usr/hdp/current/spark-client/bin:$PATH
Then run the script with
Create a file
SPARK_HOME/conf/spark-defaults.conf and add the following settings:
spark.driver.extraJavaOptions -Dhdp.version=184.108.40.206–2041 spark.yarn.am.extraJavaOptions -Dhdp.version=220.127.116.11–2041
Now before we can test run a sample, use your browser to navigate to
http://<hostname>:8080 for the Ambari interface. Use
admin as username and password respectively to login.
After you login ensure necessary services like Datanode, Nodemanager, ResourceManager, Namenode, HiveServer2, etc are running. If they are stopped, start them and wait for them to start before the next step.
Now we can test the validity of our installation by running the sample Pi calculator
cd /opt/spark ./bin/spark-submit --verbose --class org.apache.spark.examples.SparkPi --master yarn-cluster --num-executors 3 --driver-memory 512m --executor-memory 512m --executor-cores 1 lib/spark-examples*.jar 10
You can check the value of the result by navigating to the URL above. Make sure you replace
sandbox.hortonworks.com with the actual DNS name of your Sandbox VM. Also ensure that you have opened the port 8088 either on Azure or your Virtualization software.