Hadoop Tutorial – Graph Data in Hadoop with Giraph and Tez


Both Apache Graph and Tez are focused on Graph processing. Apache Giraph is a very popular tool for graph processing. A famous use-case for Giraph is the social graph at Facebook. Facebook uses Giraph to analyze how one might know a person in order to find out what other persons could be friends. Graph processing works on the travelling Sales-Person problem, trying to answer the question on what is the shortest way to get to the customers.

Apache Tez is focused on improving the performance when working with graphs. This makes the development ways easier and reduces the number of MapReduce jobs that are executed underneath it significantly. Apache Tez highly increases the performance against typical MapReduce queries and optimizes the resource management.

The following figure demonstrates graph processing with and without Tez.

MapReduce without Tez
MapReduce without Tez

MapReduce without Tez

With Tez
With Tez

With Tez

Advertisements

Published by

Mario Meir-Huber

I work as Big Data Architect for Microsoft. With this role, I support my customers in applying Big Data technologies - mainly Hadoop/Spark - for their use-cases. I also teach this topic at various universities and frequently speak at various Conferences. In 2010 I wrote a book about Cloud Computing, which is often used at German & Austrian Universities. In my home country (Austria) I am part of several organisations on Big Data.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s