I wrote a book! It's called Starting with Spark. You should read it.

all | popular | tags | rss

Data Structures in Python

In this post we will explore Data types in Python. The important thing about data structures in that you organize data to make certain things efficient.

Continue Reading »

Transporting realtime event stream with Apache Kafka

Introduction Welcome to the three part tutorial on real time data processing with Apache Kafka, Apache Storm, Apache HBase and Hive. This set of tutorials will w...

Continue Reading »

kafka, hadoop Comments

Processing realtime event stream with Apache Storm

Introduction In this tutorial, we will explore Apache Storm and use it with Apache Kafka to develop a multi-stage event processing pipeline. In an event proce...

Continue Reading »

storm, hadoop Comments

Ingesting Real Time Streams Hive Hbase

Real Time data Ingestion in Hbase and Hive using Storm 

Continue Reading »

Comments

Apache Spark Tutorial with Hortonworks Data Platform

Apache Spark is a fast, in-memory data processing engine with an elegant development API that allows data workers to efficiently execute algorithms which require ...

Continue Reading »

spark, hadoop Comments

Securing HDFS, Hive and HBase with Knox and Ranger

Introduction Apache Ranger delivers a comprehensive approach to security for a Hadoop cluster. It provides central security policy administration across the core...

Continue Reading »

Hortonworks Sandbox with HDP 2.3 is now available on Microsoft Azure Gallery

We are excited to announce the general availability of Hortonworks Sandbox with HDP 2.3 on Microsoft Azure Gallery. Hortonworks Sandbox is already a very popular ...

Continue Reading »

hadoop, azure, sandbox Comments

Processing Data Pipeline on Hadoop clusters with Apache Falcon

Apache Falcon is a framework to simplify data pipeline processing and management on Hadoop clusters. It makes it much simpler to onboard new workflows/pipelines,...

Continue Reading »

falcon, hadoop Comments

Mirroring Datasets between Hadoop clusters with Apache Falcon

Apache Falcon is a framework to simplify data pipeline processing and management on Hadoop clusters. It provides data management services such as retention, repl...

Continue Reading »

falcon, hadoop Comments
 
Page 1 of 3 Older Posts »