Why is data science failing to solve the right problems?
Don’t panic, everything’s not lost. There are solutions for both data scientists and executives.
Adam Votava, Towards Data Science, Medium
Data science’s raison d’etre is to solve problems. Yet, leading voices (e.g. VentureBeat, HBR or Gartner) are suggesting that it is failing in this pursuit. And I — as a data scientist myself — tend to agree. But, why?
Let me start with two observations.
The ‘Buzz Lightyear’ effect We are a new industry. Shiny. Flashing lights. Extendable wings. Everybody wants us! But we also come with the familiar teething problems of all nascent schools of thought. Yes, we are moving along the data value chain with alacrity. And the deployment of data-powered solutions is getting easier. But many of our hypotheses are just that: hypotheses. Untested and unproven.
So, even though opportunities abound, and optimism is high — we must safeguard against overselling and hyperbole. Data is not a toy.
Data scientists are solving the wrong problems I believe that data scientists are more often than not attempting to solve the wrong problems.
Unlike my observations on our industry buzz and maturity, this point is a consequence of methodology and/or attitude. While it’s important for our profession to continually push data’s evolutionary envelope, we also need to optimise and utilise existing solutions to solve today’s problems. This starts with accurately identifying the business issue at hand and systematically working from there.
The business issue is our touchstone. Not the data.
Importantly, we can achieve this today. We don’t have to wait for the wings of invention. Simply, the answer lies in deepening the understanding (and arguably empathy too) between the worlds of business and data.
To solve business issues, data scientists need to think like business owners. We need to be excited by business challenges and enthusiastic to widen our strategic and operational repertoire. Equally, I invite our business colleagues to be more curious about this new world of data and embrace the opportunities for communication and collaboration with data scientists — as peers rather than processors.
In this article, I offer my approach to getting to the right business issue whilst elevating the role and business impact of data. This approach is based on 100+ data projects I’ve worked on over the last 10 years.
Note: There are, of course, data scientists whose projects are delivering business value today. Businesses should actively seek these people out (and no, they don’t always have PhDs from the best schools or FAANG on their resume). Likewise, aspiring data scientists should also watch and learn from these pioneers and look to rapidly assimilate their mistakes and successes.
Why are data scientists not solving the right problems? As stated, I believe we invariably fail to identify the right problem because we do not participate in the wider business debate. Instead, we resemble an island with a single lifting bridge to access the business mainland. We seldom venture out, and then only by invitation!
Typically, information requests drop onto our island desks with scant context but clear deadlines. So, we ‘pull up the bridge’ in a desire to expedite a process, rather than lowering our defences to question our colleagues and better understand the need or true business issue.
This island status quo makes no sense at all, especially given our reliance on and advocacy of networks, shared infrastructure and digitalisation. We need to emulate Stockholm in our interactions, not Alcatraz. We must build two-way bridges and relationship causeways.
Until we learn to ask more questions we will always risk investing our time, effort and talents into solving the wrong problems. We need to push to be part of the discussion that defines business issue(s) and be prepared to go toe to toe with our business counterparts when we are convinced we have something important to say. Crucially, we also need to ask the humbling question(s) so that we learn rapidly, fail quickly and accelerate our iterative solutions.
What is the right problem? The right problem is an analytical problem; an accurate proxy for the business issue and whose solution will impact the original business issue.
1. The business issue The framework I use to get to the ‘right problem’ is very simple. It starts with a business issue, for example:
“We need to increase average sales per customer.“ “We need to stop customer attrition.“ “We need to decrease scrap rate.“ “We need to increase ROI of digital marketing.“
Business people know these very well. Data scientists? Less so.
2. The analytical problem The real job of a data scientist is to ask probing questions that transform the business issue into a clearly defined analytical problem. Modern technology and science then help with the solution. Academic papers and open-source solutions are accessible to everyone. Increasingly cheaper cloud storage and computing, and pay-as-you-go pricing models for SaaS make technology wildly available. There are no excuses here.
Once you have the analytical problem defined, ask: what data is needed to solve it? Then ascertain if the data is available, in what quality, how accessible it is (alternatively, can it be acquired or collected), how fast can you get it, and for how much.
Business issue defines the problem; analytics and data offer potential solutions.
The simple schema below outlines this 3-step framework. But don’t be fooled. It’s a surprisingly challenging process to follow. And that grey arrow in the background is the killer! While it’s tempting to ponder ‘which business problem could be solved with an exciting, cutting-edge method’ or ‘what could be done with the data sitting in a dusty database’, neither are systematic approaches to solving business issues. You must be disciplined and start at the beginning, with number 1.
It sounds straightforward, so why is it so hard?
Because we are people.
Until we get this framework deep under the skin — follow it many times and learn from our mistakes — we are susceptible to temptations and shortcuts. We must fight the urge to sidestep difficult conversations and stop pretending we understand everything to avoid bruised intellectual egos.
Let me share a couple of anecdotal examples of what can go wrong if this framework isn’t followed left to right.
Jumping to analytical solutions too quickly Data scientists are in love with analytics and technology. But, as we all know, love can be blind. Meaning, we often only see what we want to see or hear what we want to hear. The same is true for a lovestruck data scientist. For example, if they’re currently besotted with reinforcement learning or GPT-3, they will tend to see it as the first solution to every problem they are facing.
Failing to validate understanding Another problem is jumping to assumptions, rather than taking the time to confirm shared understanding. As intrepid data scientists, we might presume: “oh, that sounds like a churn model, got it!” Only to be surprised after building a fantastic model that it’s not applicable. Don’t get me wrong, the model is good. In fact, it’s excellent. But it doesn’t address the actual business issue.
Not playing to our strengths…
So many times, I’ve heard business leaders telling me that what they need is (by way of example) a: ‘recommendation engine that considers sales, customer reviews and margin to rank products.’ Whereas, the analytical problem (a recommendation engine) and data (sales, reviews and margins) should emanate from the business issue.
I urge you as a data scientist and functional expert to question every instance where an analytical problem has been solved for you. If a business leader fails to outline each step of the framework to your satisfaction, there is a high probability that the suggested analytical problem is not a solution to the business issue, but rather a waste of time and resources.
The same is true for data. For example, a business leader may read about dark data in their favourite executive journal and ask their data scientists to: “decrease dark data by 10%”. Or instruct them to: “do something with the user reviews data”. By doing this, they are betting on the data scientists’ ability to prioritise the right business issues, all while sitting on their island with the lifting bridge stubbornly raised. Ask questions; you’re the expert. Simple.
Analytics for ‘likes’ Peer pressure is the siren call to do all the ‘cool analytics’. Start-ups are particularly prone to claiming ‘AI-powered’, while larger corporates profess to ‘embracing data opportunities’. I generally question all such proclamations. If the analytics is not actually solving business issues by enabling better decision making, improving operational efficiency or creating new revenue streams — it’s PR.
‘Data’ is not a strategy.
Instead, data enables the delivery of strategy, innovation, efficiencies, …
Bridges, not islands I can’t stress the analogy of bridge building enough. Only by integrating data, data teams and data insights into the wider organisation can better decisions be made. Remember that not everyone in an organisation is necessarily compelled by the value of data. They may not be data literate and they may not have — even a high-level — understanding of analytical methods and what they offer. A key part of every data scientist’s role is to evangelise the role and power of data.
As stated, your bridges must carry two-way traffic. It is equally vital that data scientists better understand business issues. Because, if they don’t, they will fail to explain their analytical choices and assumptions when communicating solutions and ultimate value.
It is my firm opinion that this challenge sits squarely with us — the data scientists. We need to win the trust of our colleagues and customers by understanding the business issue first. Not their business issue. Our business issue.
Then we craft professional solutions, clearly articulate how they work and crucially explain why and when they don’t.
How to formulate the right problem? Let me now share my approach to understanding the business issue to make sure I solve the right (analytical) problem.
Restaurant example Imagine a restaurant owner says to you that she needs to get her inventory right to avoid running out of ingredients or ending up with too much waste.
Regrettably, restaurant life is not a Kaggle competition and the restaurant owner doesn’t have the data or an evaluation metric ready for you. So, let’s start by turning this business issue into an analytical problem.
Maximise understanding of the business issue
Initially, I ask questions to unpick the issue so that I can start stitching a potential solution in my mind. The restaurateur’s challenge seems quite intuitive but let’s not fall into the assumption trap! Rather, for example I’d ask her:
“How would you describe your restaurant?” “How does inventory work in the restaurant business?” “How do you decide what to order today?” “What is the hardest thing about getting the order right?” “What is a bigger problem: running out of ingredients or ordering too much?” “How does the ordering process work? How often can you order? And how long in advance?”
Let’s assume that I’ve learnt the following: It’s a salad bar and all ingredients must be fresh. Currently, she is placing an order, once a week, each Friday for the following week. The ingredient deliveries are made daily, based on that weekly order. Most of the salads have the same base ingredients and substitute choices are available for customers if specific ingredients run low/out. Currently she’s ordering a bit more if she ended the prior week low on ingredients with a shrinking menu. Conversely, if she had significant waste, she orders a little less. There really isn’t any ‘science’ in her ordering and each week differs from the prior.
Start framing the analytical problem From this information, we can start putting together a possible solution, perhaps a demand forecasting model, predicting the number of customers per day. The model would need to predict well for nine days ahead.
The data universe to represent the problem The next step is to formulate hypotheses on what data to use. Again, I’d ask something like:
“Based on your experience, what influences how many customers come?” “On which days do you have most customers: what’s the weather, are there any local events, anything else that drives food traffic to your restaurant?” “Are there times in the working day when the restaurant is empty?”
And I may learn that: it depends on the weather (fewer customers when it’s cold or raining); day of the week; reviews on a local social media foodie pages; events at a nearby university; and, so on.
Again, treat whatever you learn as hypotheses for data. People might tell you that it seems to depend on weather, but nobody will say “10mm of morning precipitation is the key!” And even if they do, test this hypothesis.
Do some research and think about the problem It’s helpful to then do some desk research into how others approach the same or similar problem. You might find some useful sources like e.g. this paper, this blog or this article. By now, you should have a solid understanding of the problem in your head. So, it’s important that you outline and test possible solutions. It is also helpful to engage a thought partner to challenge your thinking, assumptions and ideas.
I would also do a first analysis of what data is available, in what format and quality. But nothing too time consuming before the following step.
Explicitly validate your understanding It is now crucial to test that the potential solution is based on a shared understanding of the business issue. Before you do any more, have another conversation with the restaurant owner and explain to her how you’ve framed the problem, what are your assumptions, what simplifications are you making and what key influencing factors will you consider.
It is also highly valuable to confirm the content and format of the output. Plus, how can she make use of the information and what are the technical limitations and so on.
Solving the problem
It’s finally time for data science work. Go ahead and get the data, do some data engineering, train and test the ML model, deploy it, create CI/CD pipelines, put monitoring and triggering in place.
Then measure the business impact and improve the solution again and again and again…
And, though this step is clearly fundamental, it has also been widely documented by many esteemed data scientists. Standing on their shoulders, I would add: Data science is an iterative craft. Don’t overthink it. Start somewhere. You’ll get better in execution with each repetition.
So, in conclusion, here are my takeaways:
Don’t play into the ‘Buzz Lightyear’ effect. Data is not a toy. It is a strategic tool supporting critical decision making, optimisation and potential revenue streams.
Start with #1. Business issues define the analytical problems; analytics and data offer potential solutions. Be systematic in understanding the business issue first.
Build bridges not islands. The more you and your data team is integrated into the wider organisations, the greater your likelihood of success. Be generous with your insights and curious about the business issues.
Be peers not processors. Ask questions. Never accept analytical problems that you haven’t been part of framing.
Throughout this article I’ve used the term data scientist as a placeholder for any data worker solving business or real-world problems with data. There are of course many data professions. McKinsey uses a term analytics translator for a person translating business problems into data problems. Large corporates might have a special role for that. I believe every data scientist should hone this skill.
Co-authored by my wonderful business partner Chelsea Wilkinson.