Apache Storm is in charge for analyzing streaming data in Hadoop. Storm is extremely powerful when analyzing streaming data and is capable of working near real-time. Storm was initially developed by Twitter to power their streaming API. At present, Storm is capable of processing 1 million tuples per node and second. The nice thing about Storm is that it scales linearly.
The Storm architecture is similar to other Hadoop projects. However, Storm comes with different challenges. First, there is Nimbus. Nimbus is the controller for Storm, which is similar to the JobTracker in Hadoop. Apache Storm also utilizes ZooKeeper. The Supervisor is on each instance and takes care of the tuples once they come in. The following figure shows this.
Major concepts in Apache Storm are 4 elements: streams, spouts, bolts and topologies.
Streams are an unbound sequence of Tuples, a Spout is a source of streams, Bolts process input streams and create new output streams and a topology is a network of Bolts and Spouts.