Big Data challenges: different storage systems


A main factor to Big Data is the variety of data. Data may not only change over time (e.g. a web shop not only wants to sell books but also cars) but will also have different formats. Databases must provide this possibility. Companies might not only store all their data in one single database but rather in different databases and different APIs consume different formats such as JSON, XML or any other type. Facebook, for instance, uses MySQL, Cassandra and HBase to store their data. They have three different storage systems (Harris, 2011) (Muthukkaruppan, 2010), each of them serving a different need.

(Helland, 2011) described the challenges for datastores with 4 key principles:

  • unlocked data
  • inconsistent schema
  • extract, transform and load
  • too much to be accurate

By unlocked data, it is meant that data is usually locked but with Big Data, this might result in problems, as they don’t rely on locked data. On the other hand, unlocked data leads to semantically changes in a database. With inconsistent schema, (Helland, 2011) describes the challenge of data from different sources and formats. Schema needs to be somewhat flexible to deal with extensibility. As stated earlier, businesses change over time and so does the data schema. Extract, transform and load is something very specific to Big Data Systems, since data comes from many different sources and it needs to be put into place in a specific system. Too much to be accurate outlines the “velocity” problem with Big Data applications. If data is calculated, the result might not be exact since the data the calculation was built upon might have already changed. (Helland, 2011) states that you might not be accurate at all and you can only guess results.

Advertisements

Published by

Mario Meir-Huber

I work as Big Data Architect for Microsoft. With this role, I support my customers in applying Big Data technologies - mainly Hadoop/Spark - for their use-cases. I also teach this topic at various universities and frequently speak at various Conferences. In 2010 I wrote a book about Cloud Computing, which is often used at German & Austrian Universities. In my home country (Austria) I am part of several organisations on Big Data.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s