Big Data: Elements for Data Quality


Whenever we talk about Big Data, one core topic is often not included: Data Quality. If we Data, all the Data doesn’t really help us if the data quality is poor. There are several key topics that data should contain in terms of quality.

Relevance – Data should contain a relevant subset of the reality to support the tasks within a company.

Correctness – Data should be very close to reality and correct.

Completeness – There should be no gap for data sets and data should be complete as possible.

Timeliness – Data should be up-to-date.

Accuracy – Data should be accurant to serve the needs of the enterprise.

Consistency – Data should be consistent.

Understandability – Data should be easy to interpret. If it is not possible, data should be explained by metadata.

Availability – Data should be available at any time.

 

Advertisements

Published by

Mario Meir-Huber

I work as Big Data Architect for Microsoft. With this role, I support my customers in applying Big Data technologies - mainly Hadoop/Spark - for their use-cases. I also teach this topic at various universities and frequently speak at various Conferences. In 2010 I wrote a book about Cloud Computing, which is often used at German & Austrian Universities. In my home country (Austria) I am part of several organisations on Big Data.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s