I recently came across this quote from the book Data Analysis as Art by Elizabeth Matsui and Roger D. Peng:
“Data analysis is hard, and part of the problem is that few people can explain how to do it. It’s not that there aren’t any people doing data analysis on a regular basis. It’s that the people who are really good at it have yet to enlighten us about the thought process that goes on in their heads.”
I find this quote fascinating. It’s so true. And it might hold true for almost every human. After all, human language began to evolve 100,000 years ago and yet it is not uncommon to stumble upon the phenomena of bad communication on a daily basis.
Ways in which we communicate findings
How often do we read journalism that provides us with an observation that is explored, studied, analysed, and explained within a context where it can be interpreted, and therefore, understood? Or it at least provides us with a different perspective — a valuable insight for those who have not done the research or analysis.
Take scientific papers, for example. They are full of insights. They contain so much information and details about the process, techniques, results, data, etc. that they offer a great place to start learning. But it’s often hard to understand the underlying motivation of the scientist’s research or analysis and to grasp why we should care to read it, or even more so, understand what the next steps should be after learning about the findings. Of course, this is not the case for those involved in that particular field, as they are fluent in scientific jargon and most likely already have some idea about the topic. But where’s the fun in that? Surely the work is interesting and worth sharing, but if this kind of information isn’t easily accessible (understandable) to all, then it becomes too much like a private conversation.
Great communication comes when you understand the whole context. Well, not the whole context because that is some Big Data, but at least a sample that represents the whole. It is then when communication thrives because it enables eloquence in sharing that knowledge.
I believe that, generally, we are unable to describe the entire process — the thought process in our heads — because there are many dependencies, that together, push us back to think over the initial question. The solution: Epicycles of analysis.
Data Analytics as communication
If we return to Data Analysis as Art and look at their epicycles of analysis, we get an idea of the process involved when carrying out data analysis and proving a given hypothesis.
So how does this system work?
Develop expectations, collect data, and match the expectations with data. Easy.
Actually, it’s not quite as simple as that. A data analyst has to state the question in order to evaluate whether the data collected is sufficient and will thus meet the expectations. To do so, the analyst delves into exploratory analytics and searches for initial findings that will lead them towards meeting that expectation. With the findings, the data analyst then proceeds to build models and interpret their outcome — firstly from a statistical point of view, and then by double checking to understand how this outcome relates to the initial question.
You can see how this process already has plenty of layers of analysis, that on top of everything, are interconnected. All this must happen in order to reach an eloquent form of communication.
It is highly interesting to carry out this process, especially if the analyst is given the opportunity to take the direction that spark his/her interest. As a result, the quality of the insight only increases.
If this process is slightly abstract, think about what other sectors refer to as ‘design thinking’. The basics are the same: Evaluate if the original question/ the original product serves the need or solves the need it once had. Both terminologies define the need to dig deeper in order to understand the source of curiosity and give it meaning. And this takes time. It can’t be done quickly and it can’t be automated unless the concept is fully developed.
I attended a talk the other day about adapting a business towards a Development and Operations (devOps) or Agile culture. A member of the audience asked why there isn’t a process to evaluate how fast a human can code. His interjection struck me because it seems his conclusion was that we need people to code faster in order to do a better job — and thus become agile. This is quite misleading, and it completely disregards the process a data analyst undertakes to do her/his job efficiently.
A data scientist has a lot going on, and coding faster does not solve the problem. It isn’t about speed; it’s about how well the data analyst is able to understand what their client wants, evaluate if they can do it with the given data, and evaluate if they can provide the client with actionable insights.
As you can see from the diagram above, the process of data analysis is a highly meticulous one and coding skills are in fact secondary. What it all comes down to is communication; if a client fully understands and communicates well what they want, a data scientist can deliver.
As a mentor, I push my students to communicate the essence of the project they’re working on and to revisit whether or not these goals match the expectations. Dedicating time to thinking about how to deliver, or rather, how to communicate, pays off, which is why the students revisit this process in each of the five presentations they give whilst on our five month Data Analytics and Machine Learning program. The aim of this is to fully prepare them for their future career as a data analyst, where they’ll have to present their findings to a board of executives or external clients.
There is an interesting learning curve to be observed for each student and we are here to help them overcome their initial “failures”. In fact, we believe that “failing” is a crucial part of the learning process.