Hadoop Tutorial – Apache ZooKeeper for distributed coordination


One of the key infrastructure services for Hadoop is Apache ZooKeeper. ZooKeeper is in charge of coordinating nodes in the Hadoop cluster. Key challenges for ZooKeeper in that domain are to provide high availability for Hadoop and to take care of the distributed coordination.

Under these challenges, Hadoop takes care of managing the cluster configuration for Hadoop. A key challenge in the Hadoop Cluster is naming, which has to be applied to all nodes within a cluster. Apache ZooKeeper takes care of that by providing unique names to individual nodes based on naming conventions.

The hierarchy in Zookeeper
The hierarchy in Zookeeper

As shown in Figure 7, naming is hierarchical. This means that naming also occurs via a path. The Root instance starts with a “/”, all successors have their unique name, and their successors also apply this naming schema. This enables the cluster to have nodes with child-nodes, which in return has positive effects on maintainability.

ZooKeeper takes care of synchronization within the distributed environment and provides some group services to the Hadoop Cluster. As of synchronization, there is one server in the ZooKeeper Service that acts as the “Leader” of all servers running under the ZooKeeper Service. The following illustration shows this.

Synchronisation in the ZooKeeper Service
Synchronisation in the ZooKeeper Service

To ensure a high uptime and availability, individual servers in the ZooKeeper service are mirrored. Each of the servers in the service knows any other server. In case that one server has a failure and isn’t available any more, clients connect to other servers. The ZooKeeper service itself is built for failover and is also highly scalable.

Advertisements

Published by

Mario Meir-Huber

I work as Big Data Architect for Microsoft. With this role, I support my customers in applying Big Data technologies - mainly Hadoop/Spark - for their use-cases. I also teach this topic at various universities and frequently speak at various Conferences. In 2010 I wrote a book about Cloud Computing, which is often used at German & Austrian Universities. In my home country (Austria) I am part of several organisations on Big Data.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s