Apache Spark Tutorial with Hortonworks Data Platform

Apache Spark is a fast, in-memory data processing engine with an elegant development API that allows data workers to efficiently execute algorithms which require iterative access to datasets, like machine learning algorithms. Spark on YARN enables deep integration with Hadoop and other YARN enabled workloads in the enterprise.

Below, we are going to explore the basic concepts of Apache Spark and the first few necessary steps to get started.

Introduction
Configuring Hortonworks Sandbox on Azure
Installing Apache Spark 1.3.1 on HDP 2.2.4.2
Installing Apache Spark 1.2.0 on HDP 2.2
Basics of programming Apache Spark
A short primer on Scala
Exploring Spark with Scala
Using Hive and ORC with Apache Spark
Installing and configuring Zeppelin
Using IPython Notebook with Apache Spark

Securing HDFS, Hive and HBase with Knox and Ranger

Saptak Sen

If you enjoyed this post, you should check out my book: Starting with Spark.

Table of Contents

Securing HDFS, Hive and HBase with Knox and Ranger

Ingesting Real Time Streams Hive Hbase

Saptak Sen

Share this post