I wrote a book! It's called Starting with Spark. You should read it.

all | popular | tags | rss

Saptak Sen

Senior Technical Product Manager @ Hortonworks Current Hortonworks Previous Microsoft Corporation,  Myzus,  AAL Infotech Send a message on Link...

Continue Reading »

saptak Comments

Analyzing Data in IPython Notebook with Apache Spark

In this tutorial, we are going to explore how to use IPython Notebook with Apache Spark to analyze text.

Continue Reading »

spark, hadoop Comments

A Lap around Apache Spark 1.3.1 with HDP 2.3

This Apache Spark 1.3.1 with HDP 2.3 guide walks you through many of the newer features of Apache Spark 1.3.1 on YARN.

Continue Reading »

spark, hadoop Comments

Using IPython Notebook with Apache Spark

In this tutorial we are going to configure IPython notebook with Apache Spark on YARN in a few steps.

Continue Reading »

spark, hadoop Comments

Installing and configuring Zeppelin

git clone https://github.com/apache/incubator-zeppelin.git mv incubator-zeppelin/ /opt/ cd /opt/incubator-zeppelin/

Continue Reading »

spark, hadoop Comments

Using Hive and ORC with Apache Spark

In this tutorial, we will explore how you can access and analyze data on Hive from Spark.

Continue Reading »

hive, hadoop, spark Comments

Exploring Spark with Scala

In this section we are going to walk through the process of using Scala and Apache Spark to interactively analyze data on a Apache Hadoop Cluster.

Continue Reading »

spark, hadoop Comments

A short primer on Scala

Scala is relatively new language based on the JVM. The main difference between other “Object Oriented Languages” and Scala is that everything in Scala is an objec...

Continue Reading »

scala, hadoop Comments

Installing Apache Spark 1.2.0 on HDP 2.2

After SSH’ing into your Sandbox, use su to login as root. Now let’s get the bits using the command

Continue Reading »

spark, hadoop Comments
« Newer Posts Page 2 of 3 Older Posts »