Most IT departments produce a large amount of log data. This occurs especially when server systems are monitored, but it is also necessary for device monitoring. Apache Flume comes into play when this log data needs to be analyzed.
Flume is all about data collection and aggregation. The architecture is built with a flexible architecture that is based on streaming data flows. The service allows you to extend the data model. Key elements of Flume are:
- Event. An event is data that is transported from one place to another place.
- Flow. A flow consists of several events that are transported between several places.
- Client. A client is the start of a transport. There are several clients available. A frequently used client for example is the Log4j appender.
- Agent. An Agent is an independent process that provides components to flume.
- Source. This is an interface implementation that is capable of transporting events. A sample of that is an Avro source.
- Channels. If a source receives an event, this event is passed on to several channels. A channel is a storage that can handle the event, e.g. JDBC.
- Sink. A sink takes an event from the channel and transports it to the next process.
The following figure illustrates the typical workflow for Apache Flume with its components.