Business-Driven Data Science? – Part 1: Data Science is Looking at Business Through a Keyhole

data-header

Here is a simple question you can put to your data science team:

Name metrics you use in your data analyses

You can be the company’s Chief Data Officer or a data science manager asking a team member or you can be a data scientist on the team asking yourself. Here is what you’ll likely hear (or say): Confusion Matrix or metrics derivable therefrom (Accuracy, Precision, Negative Predictive Value, Recall, F-measure), Gain and Lift, Kolmogorov-Smirnov measure, Area Under the ROC curve (AUC – ROC), Gini Coefficient, Concordant – Discordant Ratio, Root Mean Squared Error, Variance Explained, and so on.

Notice that you didn’t hear a single business metrics, only technical metrics. Your business client won’t know a single one of these metrics.

Before we go on, let’s synchronize our vocabulary. I use the term business client to represent for whom the data science is being done or who in the business would be the person to evaluate the impact of the data science work on the business; who would receive the results of the analysis to derive actions at the business level.

Second, by business metric, I mean any metric that would be understood by the business client. Often these will be domain-specific metrics, but they will sometimes be much more general and applicable to many domains. Examples are monetized metrics (cost, loss, profit).

What am I getting at? A much-discussed gripe over the past decade has been the so-called problem of (lack of) business-IT alignment. In its essence, the problem is that almost all IT people make their decisions thinking about the technical world of IT and not the effect those decisions have on the business. As an example, when a manager has to prioritize actions (“Of the hundreds of outstanding Requests For Change, which ones should we schedule for the next change window?”), she often decides based on convenience, rather than on the business impact of the decision. This is extremely common, even in large companies. My point is that most data science teams have a similar problem: their heads are in the technical world and quite disconnected from the business impact of their work.

Let’s explore this problem a bit more to see if we can convince ourselves (or not) that it exists. Let’s continue with the initial test and ask our data science team a few more questions to see how much we know about the business:

  • Who are your business clients?
  • Can you prioritize your business clients? How is this done? Is business impact evaluated?
  • Can you describe what each client stands to gain from your work? What return can the business expect from your work?
  • Is the above answer vague (increase sales) or precise (a minimum $ or % increase in some monetized metric, for example)
  • What metrics do you use in your work? Removing technical metrics from the list, what’s left?
  • What is a minimum or maximum acceptable value for a particular (technical) metric and how was that value chosen? Who chose it?
  • What is the main success criterion for your current project from a business point of view?
  • Do you have a concrete test of success for your current project?
  • If I asked the above questions to your business clients, what would they say? Is there significant overlap between your answer and theirs?

Although I have no concrete data to back me up, I surmise that for many (most?) data science teams, knowledge of the business is very poor and obtained second-hand. We don’t really know how the business is affected by our models. It is as if the data science team saw, possibly, a bit of the business, but looking at it through a keyhole: you may have glimpse of how your work connects to the business, but the technical work is dissociated from the business world. It is difficult to evaluate how a model that is judged to be successful according to a technical metric will affect the business. But not factoring in the business adequately in our models can yield sub-optimal answers: tuning a model using a technical metric (AUC, say) may not be optimal for the business.

Of course, the communication gap also goes in the other direction: top management often cannot communicate clearly and concretely about what they expect from the data science team. Many companies are jumping on the data bandwagon without a strategy and without clear objectives in terms of business metrics. It only compounds the problem if top management has fuzzy ideas and the data science team is trapped in the technical world.

The conclusion is that an immature company may suffer from a Business-Analytics alignment gap. This is a hard problem to solve and I don’t profess to have full answers. Future blog entries will explore the matter further, discuss specific business metrics, provide example models, some solutions and directions for investigation. The objective is to prepare data science teams to do Business-Driven Data Science.

I welcome comments about this matter.

Leave a Reply

Your email address will not be published. Required fields are marked *