The Datalake as driver for digital transformation & data centricity


Everyone (or at least most) companies today talk about digital transformation and treat data as a main asset for this. The question is where to store this data. In a traditional database? In a DWH?

I think we should take a step back to answer this question. First of all, a Datalake is not a single piece of software. It consists of a large variety of Platforms, where Hadoop is a central one, but not the only one – it includes other tools such as Spark, Kafka, … and many more. Also, it includes relational Databases – such as PostgreSQL for instance. If we look at how truly digital companies such as Facebook, Google or Amazon solve these problems, then the technology stack is also clear; in fact, they heavily contribute to and use Hadoop & similar technologies. So the answer is clear: you don’t need overly expensive DWHs any more.

However, many C-Level executives might now say: “but we’ve invested millions in our DWH over the last years (or even decades)”. Here the question is getting more complex. How should we treat our DWH? Should it be replaced or should the DWH become the single source of truth and should the Datalake be ignored? In my opinion, both options aren’t valid:

First, replacing a DWH and moving all data to a Datalake will be a massive project that will bind too many resources in a company. Finding people with adequate skills isn’t easy, so this can’t be the solution to it. In addition to that, there are hundreds of business KPIs built, a lot of units within large enterprises built their decisions on these. Moving them to a Datalake will most likely break (important) business processes. Also, previous investments will be vaporised. So a big-bang replacement is clearly a no-go.

Second, keeping everything in the DWH is not feasible. Modern tools such as Python, Tensorflow and many more aren’t well supported by proprietary software (or at least, get the support with delay). From a skills-perspective, most young professionals coming from university get skills in technologies such as Spark, Hadoop and alike and therefore the skills shortage can be solved easier by moving towards a Datalake. I am speaking at a large number of international conferences; whenever I ask the audience if they want to work with proprietary DWH databases, no hands go up. If I ask them if they want to work with Datalake technologies, everyone raises the hand. The fact is, that employees choose the company they want to work for, not vice versa. We have a skills shortage in this area, everyone ignoring or not accepting that is simply wrong. Also, a DWH is way more expensive then a Datalake. So also this option is not a valid one.

So what is my recommendation or strategy? For large, established enterprises, it is a combination of both steps, but with a clear path towards replacing the DWH in the long run. I am not a supporter of complex, long-running projects that are hard to control and track. Replacing the DWH should be a vision, not a project. This can be achieved by agile project management, combined with a long-term strategy: new projects are solely done by Datalake technologies. All future investments and platform implementations must use the Datalake as the single source of truth. Once existing KPIs and processes are renewed, it must be ensured that these technologies are implemented on the Datalake and that the data gets shifted to the Datalake from the DWH. To make this succeed, it is necessary to have a strong Metadata management and data governance in place, otherwise the Datalake will be a very messy place – and thus become a data swamp.

Advertisements

Big Data in Logistics


In the last weeks, I outlined several Big Data benefits by industries. The next posts, I want to outline use-cases where Big Data are relevant in any company, as I will focus on the business functions.

This post’s focus: Logistics.

Big Data is a key driver for logistics. By logistics, companies that provide logistics solutions and companies that take advantage of logistics are meant. On the one hand, Big Data can significantly improve the supply chain of a company. For years – or even decades – companies rely on the “just in time” delivery. However, “just in time” wasn’t always “just in time”. In many cases, the time an item spent on stock was simply reduced but it still needed to be stored somewhere – either in a temporary warehouse on-site or in the delivery trucks themselves. The first approach is capital intensive, since these warehouses need to be built (and extended in case of growth). The second approach is to keep the delivery vehicles waiting – which creates expenses on the operational side – each minute a driver has to wait, costs money. With analytics, the just in time delivery can be further improved and optimized to lower costs and increase productivity.

Another key driver for Big Data and logistics is the route optimization. Routes can be improved by algorithms and make them faster. This lowers costs and on the other hand significantly saves the environment. But this is not the end of possibilities: routes can also be optimized in real-time. This includes traffic prediction and jam avoidance. Real-time algorithms will not only calculate the fastest route but also the environmental friendliest route and cheapest route. This again lowers costs and time for the company.

Header Image by  Nick Saltmarsh / CC BY

Big Data for Customer Services


In the last weeks, I outlined several Big Data benefits by industries. The next posts, I want to outline use-cases where Big Data are relevant in any company, as I will focus on the business functions.

This post’s focus: Customer Services.

Big Data is great for customer services. In customer services, there are several benefits for it. A key benefit can be seen in the IT help desk. IT help desk applications can greatly be improved by Big Data. Analysing past incidents and calls, their occurrence and impact can give great benefits for future calls. On the one hand, a knowledge base can be built to give employees or customers an initial start. For challenging cases, trainings can be developed to reduce the number of tickets opened. This reduces costs on the one side and improves customer acceptance on the other side.

Big Data can have a large impact here. When a customer feels treated well, the customer is very likely to come back and buy more at the company. Big Data can serve as an enabler here.

Big Data for Sales


In the last weeks, I outlined several Big Data benefits by industries. The next posts, I want to outline use-cases where Big Data are relevant in any company, as I will focus on the business functions.

This post’s focus: Sales.

Las week I outlined Marketing possibilities (and downsides) with Big Data. Very similar to Marketing is Sales. Often,  those two things come together. However, I would say it needs to be stated separately. In this post, I won’t discuss the Sales opportunities in Big Data from Webshops and alike. Today, I want to focus on Big Data opportunities that respect privacy but still have an impact.

Last year, I attended a conference where a company outlined their big data case. It was about analysing bills issued in their chain stores. The data from the bills included no personal details like credit card number, bonus card number and alike. It was only about what was in the basket. With the help of that, they could figure out what products get more attention at a specific store and how it differs from other stores. This data was joined with open data from public sources and other data about demographics. They could also find out that specific products get bought with another products – which means that if customer X buys product C, the customer is very likely to buy product D. An example of that for instance is that if you buy a skirt, you are also likely to buy a top.

The later example focused on analysing data for fashion stores. However, most stores can benefit from Big Data. I recently had the chance to talk to the CIO of a large supermarket chain. They also have some Big Data algorithms that improve their chain stores. The company’s policy is to accept their customer’s privacy and they don’t work on their personal data. They figured out when the neighbourhood changes – e.g. because a university was built. They could see that other products are demanded and changed the assortment of goods accordingly.

There are many opportunities where Big Data can improve Sales, and as shown in these two examples, they don’t necessarily need to violate someone’s privacy.

Big Data for Marketing


In the last weeks, I outlined several Big Data benefits by industries. The next posts, I want to outline use-cases where Big Data are relevant in any company, as I will focus on the business functions.

This post’s focus: Marketing.

Marketing is one of the use-cases for Big Data, which are discussed controversial. One the one hand, it gives opportunities to companies to adjust offers to their customers and make the offers more “individual”. I will describe the themes here before I will discuss the downsides of this.

With customer loyalty programs, companies can better “target” their customers. When the company understands the behaviour of the customer, special offers and promotions can be sent to the customer. We all know this from large online shops, where you get regular offers by e-mail. But this also applies to retail stores around you: with programs from the retailers, they also collect data about their customers and can improve the portfolio. Furthermore, they can make their advertisement more individual – and increase the revenue. Marketing gets valuable insights for all industries. Retail is the most common, but also other industries that are not in retail can gain benefits from it. Companies that work in B2B can create value from Big Data by adjusting their sales processes adjusted by data – and react to new trends before competitors find out.

On the other side, this is somewhat frightening. I am basically in favour of Big Data. However, there must be some kind of assurance that personal privacy is respected. At present, it is hard to opt-out of such programs.

Big Data is everywhere! In all major industries


The last weeks I outlined several industries that can benefit from Big Data. However, this was just a short overview on what is possible. Let me use this post to sum up the industries that benefit from Big Data. You can get an overview by this tag.

In the first post I started with manufacturing. This traditional industry sees major benefits from Big Data, especially with Industry 4.0. You can read the full post here. Big Data is already used heavily by another industry – the finance sector. Major banks, insurances and financial service providers use Big Data. I outlined the possibilities in this post.

Big Data is also a Big Deal for the public sector. Not just that the Obama administration announced to make more data available – it also gives major benefits to smart cities and alike. You can read the full post here. Often included in public sector is healthcare. Healthcare sees great benefits from using Big Data as well. I’ve summed up the benefits here.

The oil and gas industry can also benefit from Big Data by applying them to sensors while drilling. A sector where you might not expect benefits from IT or Big Data is agriculture. But Big Data can give major benefits to this industry as well – as described here.

Next week I will start to look at the functions within a company – to see where Big Data is within a company – independent from the industry.

Big Data in Agriculture


Big Data is a disruptive technology. It is changing major industries from the inside. In the next posts, we will learn how Big Data changes different industries.

Today’s focus: Big Data for Agriculture.

Well wait – farming and IT? Really? Short answer: YES!

I believe that we are at the brink of something revolutionary in agriculture. This topic has largely been ignored in industrialization and the ongoing digitalization. Agriculture is (at least in Europe) done by many farmers that cultivate rather small land. Big Data is not about to change this in favor of few farmers on large land – the changes are more about performance, quality and quantity.

I recently had a very interesting discussion with someone from a ministry in Europe working on IT and agriculture. They expect a lot from Big Data. First, they want to improve the way how terrain is used by integrating geo-data from satellites. Analysing the terrain and former usages of the terrain gives additional benefits on what to grow on a specific place. The ministry also wants to integrate weather data in combination with what grains grow on a specific place. This would give additional informations on where water is missing. The long-term idea behind that is to make integrate drones that take care of watering grains and plants that had too little water so far. This is also useful for “premier” goods such as wine. Better quality means higher prices and profits for farmers.

At present, companies such as John Deree are working on integrating Data into their products and services. We can expect some very interesting things to happen here 😉