Big Data 101: Scalability


Scalability is another factor of Big Data Applications described by (Rys, 2011). Whenever we talk about Big Data, it mainly involves high-scaling systems. Each Big Data Application should be built in a way that eases scaling. (Rys, 2011) describes several needs for scaling: user load scalability, data load scalability, computational scalability and scale agility.

Data Scalability
Data Scalability

The figure illustrates the different needs for scalability in Big Data environments as described by (Rys, 2011). Many applications such as Facebook (Fowler, 2012) have a lot of users. Applications should support the large user base and should stay prone to errors in case the application sees unexpected high user numbers. Various techniques can be applied to support different needs such as fast data access. A factor that often – but not only – comes with a high number of users is the data load. (Rys, 2011) describes that some or many users can produce this data. However, things such as sensors and other devices that do not directly relate to users can also produce large datasets. Computational scalability is the ability to scale to large datasets. Data is often analyzed and this needs compute power on the analysis side. Distributed algorithms such as Map/Reduce require a lot of nodes in order to perform queries and analyze in a performing manner. Scale agility describes the possibility to change the environment of a system. This basically means that new instances such as compute can be added or removed on-demand. This requires a high level of automation and virtualization and is very similar to what can be done in cloud computing environments. Several Platforms such as Amazon EC2, Windows Azure, OpenStack, Eucalyptus and others enable this level of self-service that is a great support to scaling agility for Big Data environments.

Advertisements

Published by

Mario Meir-Huber

I work as Big Data Architect for Microsoft. With this role, I support my customers in applying Big Data technologies - mainly Hadoop/Spark - for their use-cases. I also teach this topic at various universities and frequently speak at various Conferences. In 2010 I wrote a book about Cloud Computing, which is often used at German & Austrian Universities. In my home country (Austria) I am part of several organisations on Big Data.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s