Hadoop Tutorial – Apache HBase


HBase is one of the most popular databases in the Hadoop and NoSQL ecosystem. HBase is a highly-scaleable database that works with fulfilling the partition tolerance and availability of the CAP-Theorem. In case you aren’t familiar with the CAP-Theorem: the theorem states that requirements for a database are consistency, availability and partition tolerance. However, you can only have two of them and the third one comes with a trade-off.

HBase uses a Key/Value storage. The schema of a table in HBase is not present (schema-less), which gives you much more flexibility than with a traditional relational database. HBase takes care of the failover and sharding of data for you.

HBase uses HDFS as storage and ZooKeeper for the coordination. There are several region servers that are controlled by a master server. This is displayed in the next image.

Apache HBase
Apache HBase
Advertisements

Published by

Mario Meir-Huber

I work as Big Data Architect for Microsoft. With this role, I support my customers in applying Big Data technologies - mainly Hadoop/Spark - for their use-cases. I also teach this topic at various universities and frequently speak at various Conferences. In 2010 I wrote a book about Cloud Computing, which is often used at German & Austrian Universities. In my home country (Austria) I am part of several organisations on Big Data.

One thought on “Hadoop Tutorial – Apache HBase”

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s