Hadoop Tutorial – Apache YARN

Apache YARN can easily be called “the answer to everything”. YARN takes care of most of the things in Hadoop and you will use YARN always without noticing it. YARN is the central point of contact for all operations in the Hadoop ecosystem. YARN executes all MapReduce jobs among other things. What YARN takes care of:

  • Resource Management
  • Job Management
  • Job Tracking
  • Job Scheduling

YARN is built of 3 major components. The first one is the resource manager. The resource manager takes care of distributing the resources for individual applications. Next, there is the node manager. This component is running on the node that a specific job is running on. The third component is the Application Master. The Application Master is in charge of retrieving tasks from the resource manager and to ensure the work with the node manager. The Application Master typically works with one or more tasks.

Yarn components
Yarn components

The following image displays a common workflow in YARN.

YARN architecture
YARN architecture

YARN is used by all other projects such as Hive and Pig. It is possible to access YARN via Java Applications or a REST-Interface.


Published by

Mario Meir-Huber

I work as Principal for Data & Analytics at A1 Telekom Austria Group. I also teach this topic at various universities and frequently speak at various Conferences. In 2010 I wrote a book about Cloud Computing, which is often used at German & Austrian Universities. In my home country (Austria) I am part of several organisations on Big Data & Data Science.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s