As described in an earlier post here I outlined the fact that becoming a data scientist requires a lot of knowledge.
Focusing back, a data scientist needs to have knowledge in different IT domains:
- General understanding of distributed systems and how they work. This includes administration skills for Linux as well as hardware related skills such as networking.
- Knowledge in Hadoop or similar technologies. This knowledge basically builds on top of the former one but it is sort of different and requires a more software focused knowledge.
- Great statistical/mathematical knowledge. This is necessary to actually work on the required tasks and to figure out how they can be applied to real algorithms.
- Presentation skills. All is worth nothing as long as someone can’t represent the data or things found in the data. The management might not see the points if the person can’t present data in an appropriate way.
In addition, there are some other skills necessary:
- Knowledge of the legal situation. The legal basics are different from country to country. Though the european union gives some legal borders within member states, there are also differences.
- Knowledge of the society impacts. It is also necessary to understand how society might react to data analysis. Especially in marketing it is absolutely necessary to handle that correct
Since more and more IT companies focus on looking for the ideal data scientist, people should first try to find out who is capable of handling all of these skills. The answer to this might be: there is no person that can handle all. It is likely that one person is great in distributed systems and Hadoop but might fail in transforming questions to algorithms and finally presenting them.
Data Science is more of a team effort than a single person that can handle all of it. Therefore, it is rather necessary to build a team that will be able to address all of these challenges.