Big Data challenges: moving data for analysis


Another issue with Big Data is indicated by (Alexander, Hoisie , & Szalay , 2011). The problem is that Data can’t be moved easily for analysis. With Big Data, we often have some Terabytes or more. Moving this via a network connection is not that easy or even impossible. If real-time data is analyzed, it is literally impossible to move that amount of data to another cluster, since the data will be incorrect or not available at this time. Real-Time data analysis is also necessary in fraud protection. If this data now has to be moved to another cluster, it might already be too late. In traditional databases, this wasn’t that hard since the data was often some Gigabyte in a single database. With Big Data, data is in various formats, at high volume and at high velocity. To comply with all these factors and moving data to another cluster, this might not be possible.

(Alexander, Hoisie , & Szalay , 2011) describes some factors that influence the challenges of moving data to another cluster: high-flux data, structured and unstructured data, real-time decisions and data organization.

High-flux data describes data that arrives in real time. If the data must be analyzed, this also has to be done in real-time. The data might be gone or modified at a later point. In Big Data applications, data will arrive structured as well as unstructured. Decisions on Data must often be done in real time. If there is a data stream of financial transactions, an algorithm must decide in real time if the data needs more detailed analysis. If not all data is stored, an algorithm must decide if the data is stored or not. Data organization is another challenge when it comes to moving data.

Advertisements

Published by

Mario Meir-Huber

I work as Big Data Architect for Microsoft. With this role, I support my customers in applying Big Data technologies - mainly Hadoop/Spark - for their use-cases. I also teach this topic at various universities and frequently speak at various Conferences. In 2010 I wrote a book about Cloud Computing, which is often used at German & Austrian Universities. In my home country (Austria) I am part of several organisations on Big Data.

4 thoughts on “Big Data challenges: moving data for analysis”

  1. What about architectures that allow for redundant storing of incoming data on all places where it needs to be – be it for analysis or for the sake of DR. Would this be a solution? Is their applications of such an idea? I mean, would still slow down the real-time availability process, but in essence it would serve the matter – no?

  2. According to a recent IDG survey, this is a very real challenge that most IT leaders will admit their organizations fail to excel at. I think a lot of the problem is attributed to maturity and the lack of embracing solid strategies.

    Peter Fretty, IDG blogger working on behalf of SAS

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s