Big Data 101: Transformable and Filterable Data


Transformable If data is transformed, it can be changed to a different format or layout. This could as well mean the format change from binary to e.g. Json or XML as well as a totally new representation. If someone wants to look at a specific dataset (which, for instance, could be filtered) not all data might be interesting. Let’s assume that a manager wants to filter for all Customers younger than 18 in a specific district. The manager is probably not interested in the names of the customer but rather in the sum of customers. Instead returning a huge list of Names with addresses and alike, a number is returned. Or the online marketing department wants to target all customers with specific criteria such as age, the address might not be relevant, but Names and E-Mail are. Transformability is also a necessary characteristic if data has to be exported to another database, e.g. for analytics.

Filterable is a key characteristic to Datasets. Analytics software use Filtering frequently and it is absolutely necessary since most analytics simply don’t run on all data but rather on selected Data. Filtered Data is often represented with the “Select … Where”-Clauses in Databases. Most of what filtering of data is good for was already discussed with “Transformability”, however we would still go into detail with that. If we analyze data, it is often necessary to work on specific datasets. Imagine a Google Search Query, where you search for “Big Data”. All Data within Google’s index gets filtered for exactly these Words and a consolidated List is returned. If the online marketing department mentioned in “Transformability” wants a list of customers in a specific area, this List is also filtered based on the Zip Code or other geographical data. Hence it is an important characteristic for Data to support Filtering.

Advertisements

Published by

Mario Meir-Huber

I work as Big Data Architect for Microsoft. With this role, I support my customers in applying Big Data technologies - mainly Hadoop/Spark - for their use-cases. I also teach this topic at various universities and frequently speak at various Conferences. In 2010 I wrote a book about Cloud Computing, which is often used at German & Austrian Universities. In my home country (Austria) I am part of several organisations on Big Data.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s