Apache Spark is a fast, in-memory data processing engine with an elegant development API that allows data workers to efficiently execute algorithms which require iterative access to datasets, like machine learning algorithms. Spark on YARN enables deep integration with Hadoop and other YARN enabled workloads in the enterprise.

Below, we are going to explore the basic concepts of Apache Spark and the first few necessary steps to get started.

Table of Contents

Saptak Sen

If you enjoyed this post, you should check out my book: Starting with Spark.

Share this post