64 bit Steamroller on our way
Over the weekend I installed and configured a new build [v. 1421] of the Windows XP x64 Edition on my Compaq Presario 3000 Laptop with a 64-bit processor (just in...
Over the weekend I installed and configured a new build [v. 1421] of the Windows XP x64 Edition on my Compaq Presario 3000 Laptop with a 64-bit processor (just in...
In this post we will explore Data types in Python. The important thing about data structures in that you organize data to make certain things efficient.
In this post we will explore Data types in Python. The important thing about data structures in that you organize data to make certain things efficient.
We are excited to announce the general availability of Hortonworks Sandbox with HDP 2.3 on Microsoft Azure Gallery. Hortonworks Sandbox is already a very popular ...
For folks attending the workshop at Hadoop Summit, San Jose 2015 we provided Microsoft Azure Pass. If you already have an Azure account skip this step. If you are...
In this post, we’ll walk through the process of deploying an Apache Hadoop 2 cluster on the EC2 cloud service offered by Amazon Web Services (AWS), using Hortonwo...
Apache Falcon is a framework to simplify data pipeline processing and management on Hadoop clusters. It makes it much simpler to onboard new workflows/pipelines,...
Apache Falcon is a framework to simplify data pipeline processing and management on Hadoop clusters. It provides data management services such as retention, repl...
Introduction Welcome to the three part tutorial on real time data processing with Apache Kafka, Apache Storm, Apache HBase and Hive. This set of tutorials will w...
Introduction In this tutorial, we will explore Apache Storm and use it with Apache Kafka to develop a multi-stage event processing pipeline. In an event proce...
Apache Spark is a fast, in-memory data processing engine with an elegant development API that allows data workers to efficiently execute algorithms which require ...
Introduction Apache Ranger delivers a comprehensive approach to security for a Hadoop cluster. It provides central security policy administration across the core...
We are excited to announce the general availability of Hortonworks Sandbox with HDP 2.3 on Microsoft Azure Gallery. Hortonworks Sandbox is already a very popular ...
Apache Falcon is a framework to simplify data pipeline processing and management on Hadoop clusters. It makes it much simpler to onboard new workflows/pipelines,...
Apache Falcon is a framework to simplify data pipeline processing and management on Hadoop clusters. It provides data management services such as retention, repl...
In this tutorial, we are going to explore how to use IPython Notebook with Apache Spark to analyze text.
This Apache Spark 1.3.1 with HDP 2.3 guide walks you through many of the newer features of Apache Spark 1.3.1 on YARN.
In this tutorial we are going to configure IPython notebook with Apache Spark on YARN in a few steps.
git clone https://github.com/apache/incubator-zeppelin.git mv incubator-zeppelin/ /opt/ cd /opt/incubator-zeppelin/
In this tutorial, we will explore how you can access and analyze data on Hive from Spark.
In this section we are going to walk through the process of using Scala and Apache Spark to interactively analyze data on a Apache Hadoop Cluster.
Scala is relatively new language based on the JVM. The main difference between other “Object Oriented Languages” and Scala is that everything in Scala is an objec...
After SSH’ing into your Sandbox, use su to login as root. Now let’s get the bits using the command
In this section we will configure Spark 1.3.1 on Hortonworks Sandbox with HDP 2.2.
For folks attending the workshop at Hadoop Summit, San Jose 2015 we provided Microsoft Azure Pass. If you already have an Azure account skip this step. If you are...
Apache HBase was initially developed by Powerset, a natural language search engine startup in 2006. Then in 2008 they contributed the code base to the Apache Soft...
In this post, we will explore how to quickly and easily spin up our own VM with Vagrant and Apache Ambari. Vagrant is very popular with developers as it lets one ...
In this post, we’ll walk through the process of deploying an Apache Hadoop 2 cluster on the EC2 cloud service offered by Amazon Web Services (AWS), using Hortonwo...
Introduction Apache Ranger delivers a comprehensive approach to security for a Hadoop cluster. It provides central security policy administration across the core...
Apache HBase was initially developed by Powerset, a natural language search engine startup in 2006. Then in 2008 they contributed the code base to the Apache Soft...
Introduction Apache Ranger delivers a comprehensive approach to security for a Hadoop cluster. It provides central security policy administration across the core...
In this tutorial, we will explore how you can access and analyze data on Hive from Spark.
The most severe bottlenecks in high performance systems in majority cases results from I/O operations. To buffer I/O or other slow accesses, engineers devised cac...
Introduction Welcome to the three part tutorial on real time data processing with Apache Kafka, Apache Storm, Apache HBase and Hive. This set of tutorials will w...
Introduction Apache Ranger delivers a comprehensive approach to security for a Hadoop cluster. It provides central security policy administration across the core...
The most severe bottlenecks in high performance systems in majority cases results from I/O operations. To buffer I/O or other slow accesses, engineers devised cac...
Introduction Apache Ranger delivers a comprehensive approach to security for a Hadoop cluster. It provides central security policy administration across the core...
We are excited to announce the general availability of Hortonworks Sandbox with HDP 2.3 on Microsoft Azure Gallery. Hortonworks Sandbox is already a very popular ...
Senior Technical Product Manager @ Hortonworks Current Hortonworks Previous Microsoft Corporation, Myzus, AAL Infotech Send a message on Link...
Scala is relatively new language based on the JVM. The main difference between other “Object Oriented Languages” and Scala is that everything in Scala is an objec...
Introduction Apache Ranger delivers a comprehensive approach to security for a Hadoop cluster. It provides central security policy administration across the core...
Apache Spark is a fast, in-memory data processing engine with an elegant development API that allows data workers to efficiently execute algorithms which require ...
In this tutorial, we are going to explore how to use IPython Notebook with Apache Spark to analyze text.
This Apache Spark 1.3.1 with HDP 2.3 guide walks you through many of the newer features of Apache Spark 1.3.1 on YARN.
In this tutorial we are going to configure IPython notebook with Apache Spark on YARN in a few steps.
git clone https://github.com/apache/incubator-zeppelin.git mv incubator-zeppelin/ /opt/ cd /opt/incubator-zeppelin/
In this tutorial, we will explore how you can access and analyze data on Hive from Spark.
In this section we are going to walk through the process of using Scala and Apache Spark to interactively analyze data on a Apache Hadoop Cluster.
After SSH’ing into your Sandbox, use su to login as root. Now let’s get the bits using the command
In this section we will configure Spark 1.3.1 on Hortonworks Sandbox with HDP 2.2.
Introduction In this tutorial, we will explore Apache Storm and use it with Apache Kafka to develop a multi-stage event processing pipeline. In an event proce...
In this post, we will explore how to quickly and easily spin up our own VM with Vagrant and Apache Ambari. Vagrant is very popular with developers as it lets one ...
Over the weekend I installed and configured a new build [v. 1421] of the Windows XP x64 Edition on my Compaq Presario 3000 Laptop with a 64-bit processor (just in...