How to: Start and Stop Cloudera on Azure with the Azure CLI

The Azure CLI is my favorite tool to manage Hadoop Clusters on Azure. Why? Because I can use the tools I am used to from Linux now from my Windows PC. In Windows 10, I am using the Ubuntu Bash for that, which gives me all the major tools for managing remote Hadoop Clusters.

One thing I am doing frequently, is starting and stopping Hadoop Clusters based on Cloudera. If you are coming from Powershell, this might be rather painfull for you, since you can only start each vm in the cluster sequentially, meaning that a cluster consisting of 10 or more nodes is rather slow to start and might take hours! In the Azure CLI I can easily do this by specifiying “–nowait” and all runs in parallel. The only disadvantage is that I won’t get any notifications on when the cluster is ready. But I am doing this with a simple hack: ssh’ing into the cluster (since I have to do this anyway). SSH will succeed once the Masternodes are ready and so I can perform some tasks on the nodes (such as restarting Cloudera Manager since CM is usually a bit “dizzy” after sending it to sleep and waking it up again :))

Let’s start with the easiest step: stopping the cluster. The Azure CLI always starts with “az” as command (meaning Azure of course). The command for stopping one or more vm’s with the Azure CLI is “vm stop”. The only two things I need to provide now are the id’s I want to stop and “–nowait” since I want to quit the script right after.

So, the script would look like the following:

az vm stop --ids YOUR_IDS --no-wait

However, this has still one major disadvantage: you would need to provide all ID’s Hardcoded. This doesn’t matter at all if your cluster never changes, but in my case I add and delete vm’s to or from the cluster, so this script doesn’t play well for my case. However, the CLI is very flexible (and so is bash) and I can query all my vm’s in a resource group. This will give me the IDs which are currently in the cluster (let’s assume I delete dropped vm’s and add new vm’s to the RG). The Query for retrieving all VMs in a Resource Group is easy:

az vm list --resource-group YOUR_RESOURCE_GROUP --query "[].id" -o tsv

This will give me all IDs in the RG. The real fun starts when doing this in one statement:

az vm stop --ids $(az vm list --resource-group clouderarg --query "[].id" -o tsv) --no-wait

Which is really nice and easy 🙂

It is similar with starting VMs in a Resource Group:

az vm start --ids $(az vm list --resource-group mmhclouderarg --query "[].id" -o tsv) --no-wait

Published by

Mario Meir-Huber

I work as Big Data Architect for Microsoft. With this role, I support my customers in applying Big Data technologies - mainly Hadoop/Spark - for their use-cases. I also teach this topic at various universities and frequently speak at various Conferences. In 2010 I wrote a book about Cloud Computing, which is often used at German & Austrian Universities. In my home country (Austria) I am part of several organisations on Big Data.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s