Representation is an often-mentioned characteristic for Big Data. It goes well with “Variety” in the above stated definition. Each Data is represented in a specific form and it doesn’t matter what form it is. Well-known forms of Data are XML, Json, CSV or binary. Depending on the Representation of Data, different possibilities regarding relations can be integrated. XML and Json for instance allows us to set child-objects or relations for data, whereas it is rather hard with CSV or binary. A possibility for relations can be a dataset of the type “Person”. Each person consists of some attributes that identify the person (e.g. the last name, age, sex) and an address that is an independent entity. To retrieve this data as CSV or binary, you either have to do two queries or create a new entity for a query where the data is merged. XML and Json allows us to nest entities in other entities.
The in Figure described entity would look like the following, if presented in XML:
Listing 1: XML representation of the entity “person”
Similar to that, the Json representation of our Model “Person” would look slightly similar:
|[Person :[Common :
[“firstname” : “Mario”, “lastname” : “Meir-Huber”, “Age” : 29]
[“zipcode” : “1150”, “city” : “Vienna”]
Listing 2: Json interpretation
If we now look at how we could represent this data from a database as binary data, we need to join two different datasets. This is basically supported by SQL. A possible representation could look like the following:
Listing 3: SQL-based binary representation
The representation of Data isn’t limited to what was described in this chapter so far. There are several other formats available and others might arise in the future. However, data must have a clear and documented representation in a form that can be processed by Tools that built upon that data.