Apache Ambari was developed by the Hadoop distributor Hortonworks and also comes with their distribution. The aim of Ambari is to make the management of Hadoop clusters easier. Ambari is useful, if you run large server farms based on Hadoop. Ambari automates much of the manual work you would need to do with Hadoop when managing your cluster from the console.
Ambari comes with three key aspects around cluster management: first, it is about provisioning instances. This is helpful when you want to add new instances to your Hadoop cluster. Ambari takes care of automating all aspects of adding new instances. Next, there is monitoring. Ambari monitors your server farm and gives you an overview on what is going on. The last aspect is the management of your server farm itself.
Provisioning has always been a very tricky part of Hadoop. When someone wanted to add new nodes to a cluster, this was basically not an easy thing to do and included a lot of manual work. Most organizations abstracted this problem by creating scripts and using automation software, but this simply couldn’t fill the scope that is often necessary in Hadoop clusters. Ambari provides an easy-to-use assistant that enables users to install new services or activate/deactivate them. Ambari takes care of the entire cluster provisioning and configuration with an easy UI.
Ambari also includes comprehensive monitoring capabilities for the cluster. This allows user to view the status of the cluster in a dashboard and to get to know immediately what the cluster is up to (or not). Ambari uses Apache Ganglia to collect the metrics. Ambari also integrates the possibility to send System messages via Apache Nagios. This includes alerts and other things that are necessary for the administrator of the cluster.
Other key aspects of Ambari are:
- Extensibility. Ambari is built on a plug-in architecture, which basically allows you to extend Ambari with your own functionality used within your company or organization. This is useful if you want to integrate Hadoop into your business processes.
- Fault Tolerance. Ambari takes care of errors and reacts to them. For example, if an instance has an error, Ambari restarts this instance. This takes away much of the headache you got in previous, pre-Ambari, versions of Hadoop.
- Secure. Ambari uses a role-based authentication. This gives you more control over sensitive information in your cluster(s) and enables you to apply different roles.
- Feedback. Ambari provides Feedback to the user(s) about long-running processes. This is especially useful for stream processing and near-real-time processes that basically have no end of their lifespan.
Apache Ambari can be accessed easily via two different ways: first, Ambari provides a mature UI that enables you to access the cluster management via a Browser. Furthermore, Ambari can also be accessed via ReSTful Web Services, which gives you additional possibilities in working with the service.
The following illustration outlines the Ambari Server and the Agents Communication.
As of the architecture, Ambari leverages several projects. As key elements, Ambari uses message queues for communication. The configuration within Apache Ambari is done by Puppet. The next figure shows the overall architecture of Ambari.