Hadoop Tutorial – Real-Time Data with Apache S4


S4 is another near-real-time project for Hadoop. S4 is built with a decentralized architecture in mind, focusing on a scaleable and event-oriented architecture. S4 is a long-running process that analyzes streaming data.

S4 is built with Java and with flexibility in mind. This is done via dependency injection, which makes the platform very easy to extend and change. S4 heavily relies on Loose-coupling and dynamic association via the Publish/Subscribe pattern. This makes it easy for S4 to integrate sub-systems into larger systems and updating services on sub-systems can be done independently.

S4 is built to be highly fault-tolerant. Mechanisms built into S4 allow fail-over and recovery.

Advertisements

Published by

Mario Meir-Huber

I work as Big Data Architect for Microsoft. With this role, I support my customers in applying Big Data technologies - mainly Hadoop/Spark - for their use-cases. I also teach this topic at various universities and frequently speak at various Conferences. In 2010 I wrote a book about Cloud Computing, which is often used at German & Austrian Universities. In my home country (Austria) I am part of several organisations on Big Data.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s