Apache HBase was initially developed by Powerset, a natural language search engine startup in 2006. Then in 2008 they contributed the code base to the Apache Software Foundation so that the broad community could develop and extend this key piece of software. Since then, HBase has seen major changes and massive adoption and is now firmly established as the defacto standard NoSQL database for Hadoop.

Lets explore why organizations choose HBase over other options and how they are gaining value from it.

###Why HBase?

HBase provides near real-time, random read and write access to tables (or to be more accurate “maps”) storing billions of rows and millions of columns. It provides low latency access to a LOT of data, but how?

HBase is a columnar database, which means it pivots storage of data to a column as opposed to the traditional row based storage found in an RDBMS. This approach not only saves storage space but also reduces the amount of disk that needs to be scanned during a query, getting us close to near real time access.

With the advent of the web and ARM processor, the rate at which data is being created around us today is astounding and the billion row database is not that rare anymore. Apache Hadoop has emerged as the platform of choice for many to store and process this data because of its efficient and low cost linear scale. HBase builds upon this platform to provide a columnar data store for Hadoop.

Making sense of this data in a single platform makes sense. With Hadoop 2 and YARN, HBase can now share this single platform with other processing engines, so that you may run a pig script or analyze a stream of data at the same time (on the same data) as you are feeding online applications with real time access to HBase data. A singular system not only optimizes use of data but also presents new opportunities.

###How is HBase delivering value today?

There’s a long list of organizations like Adobe, Meetup, Pacific Northwest National Laboratory, Stumbleupon, Twitter, Yahoo that have publicly and voluntarily acknowledged their use of HBase for variety of scenarios.

One of the most common use of Hbase is to provide a recommendations to website visitors. While the recommendations may be calculated using Pig or some other processing engine, serving this data up to thousands of visitors per second needs a store like HBase. For instance, Groupon uses HBase to personalize deals to users to be more accurate and relevant. Pinterest uses HBase to deliver the right set of “Pins” to their users depending what who they follow.

Facebook’s has used HBase to modernize their entire messaging system, which combines messages, chat and email into a real-time conversation for Facebook users.

Facebook shared the following benefits of choosing HBase over numerous other NoSQL options at their disposal:

Many other organizations have this exact same set of bases to cover when they require an enterprise grade massively scalable data platform.

###HBase: The future is written

As with any broad community open source project, the pace of innovation within the HBase community is astounding. Here are few of the many improvements to HBase in just last six months:

####High Availability

####Performance

####Security

####Developer Experience

####Manageability

####Compatibility

The future of HBase based on all the work the community is doing is very exciting. Here are a few projects which hint the breadth and ambition of the work going on:

The ecosystem of tools and software to complement HBase has been growing with it. This makes consumption of HBase easier, drives further HBase adoption, and encourages investment back into it.

In summary, HBase has come a long way and can be used reliably for a variety of enterprise scenarios. It has an exciting roadmap in front of it ensuring it’s popularity for a long time.

Saptak Sen

If you enjoyed this post, you should check out my book: Starting with Spark.

Share this post