This is a follow-up post on the series about resource automation in the Cloud. In this part, we will look at monitoring. Monitoring is not the easiest thing to do in distributed systems. You have to monitor a large number of instances. The challenge is to find out what you want to monitor. If you run your application (such as a SaaS-Platform) you might not be interested in the performance of a single instance but in the performance of the application itself. You might not be interested in the I/O performance of an instance but again of the overall experience your application delivers. To find that metrics, you have to invest significant experience into monitoring.
Let us look at how monitoring works basically. There are 2 key concepts to monitor instances:
- Agent-less Monitoring
- Agent-based Monitoring
- Remotly analyse the System with a remote API (e.g. Log Data on File System)
- Analyse Network Packets: SNMP (Simple Network Management Protocol) is often used for that
- No client agend to deploy
- No application to install or run on the client. Typically doesn‘t consume resources on the System
- Lower cost
- Option to close or lock down a system, don‘t allow to install new Applications
What is bad about agent-less monitoring?
- No in depth metrics for granular analysis
- Can be affected by networking issues
On the other hand, we can use Agend-based monitoring.
- Need to deploy agents to systems
- Each running System needs to have an Agent installed in order to work. This can be automated
- Internal certification for deployment on production systems in some companies
- Up-front Cost
- Requires Software or custom Development
What is good about Agend-based monitoring?
- Deeper and more granular data collection, E.g. About performance of a specific application and the CPU Utilization
- Tighter service integration
- Control applications and services on remote nodes
- Higher network security
- Encrypted proprietary protocols
- Lower risk of downtime
- Easier to react, e.g. If „Apache“ has high load