Hadoop Tutorial – Hadoop Commons


Apache Commons is one of the easiest things to explain in the Hadoop context – even though it might get complicated when working with it. Apache Commons is a collection of libraries and tools that are often necessary when working with Hadoop. These libraries and tools are then used by various projects in the Hadoop ecosystem. Samples include:

  • A CLI minicluster, that enables a single-node Hadoop installation for testing purposes
  • Native libraries for Hadoop
  • Authentification and superusers
  • A Hadoop secure mode

You might not use all of these tools and libraries that are in Hadoop Commons as some of them are only used when you work on Hadoop projects.

Advertisements

Published by

Mario Meir-Huber

I work as Big Data Architect for Microsoft. With this role, I support my customers in applying Big Data technologies - mainly Hadoop/Spark - for their use-cases. I also teach this topic at various universities and frequently speak at various Conferences. In 2010 I wrote a book about Cloud Computing, which is often used at German & Austrian Universities. In my home country (Austria) I am part of several organisations on Big Data.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s