Hadoop Tutorial – Apache Hive and Apache HCatalog


One of the easiest to use tools in Hadoop is Hive. Hive is very similar to SQL and is easy to learn for those that have a strong SQL background. Apache Hive is a data-warehousing tool for Hadoop, focusing on large datasets and how to create a structure on them.

Hive queries are written in HiveQL. HiveQL is very similar to SQL, but not the same. As already mentioned, HiveQL translates to MapReduce and therefore comes with minor performance trade-offs. HiveQL can be extended by custom code and MapReduce queries. This is useful, when additional performance is required.

The following listings will show some Hive queries. The first listing will show how to query two rows from a dataset.

hive> SELECT column1, column2 FROM dataset2 5

4 9

5 7

5 9

Listing 2: simple Hive query

The next sample shows how to include a where-clause.

hive> SELECT DISTINCT column1 FROM dataset WHERE column2 = 91

Listing 3: where in Hive

HCatalog is an abstract table manager for Hadoop. The target of HCatalog is to make it easier for users to work with data. Users see everything like it would be a relational database. To access HCatalog, it is possible to use a Rest API.

Advertisements

Published by

Mario Meir-Huber

I work as Big Data Architect for Microsoft. With this role, I support my customers in applying Big Data technologies - mainly Hadoop/Spark - for their use-cases. I also teach this topic at various universities and frequently speak at various Conferences. In 2010 I wrote a book about Cloud Computing, which is often used at German & Austrian Universities. In my home country (Austria) I am part of several organisations on Big Data.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s