Big Data 101: Data agility

Agility is an important factor to Big Data Applications. (Rys, 2011) describes 3 different agility factors which are: model agility, operational agility and programming ability.

Data agility
Data agility

Model agility means how easy it is to change the Data Model. Traditionally, in SQL Systems it is rather hard to change a schema. Other Systems such as non-relational Databases allow easy change to the Database. If we look at Key/Value Storages such as DynamoDB (Amazon Web Services, 2013), the change to a Model is very easy. Databases in fast changing systems such as Social Media Applications, Online Shops and other require model agility. Updates to such systems occur frequently, often weekly to daily (Paul, 2012).

In distributed environments, it is often necessary to change operational aspects of a System. New Servers get added often, also with different aspects such as Operating System and Hardware. Database systems should stay tolerant to operational changes, as this is a crucial factor to growth.

Database Systems should support the software developers. This is when programming agility comes into play. Programming agility describes the approach that the Database and all associated SDK’s should easy the live of a developer that is working with the Database itself. Furthermore, it should also support fast development.

Big Data: Elements for Data Quality

Whenever we talk about Big Data, one core topic is often not included: Data Quality. If we Data, all the Data doesn’t really help us if the data quality is poor. There are several key topics that data should contain in terms of quality.

Relevance – Data should contain a relevant subset of the reality to support the tasks within a company.

Correctness – Data should be very close to reality and correct.

Completeness – There should be no gap for data sets and data should be complete as possible.

Timeliness – Data should be up-to-date.

Accuracy – Data should be accurant to serve the needs of the enterprise.

Consistency – Data should be consistent.

Understandability – Data should be easy to interpret. If it is not possible, data should be explained by metadata.

Availability – Data should be available at any time.