December has been a contemplative month, and January is going to be hectic.

I don't believe in setting goals or making resolutions for the New Year. Usually I just let the New Year pass by. But there are ideas circulating in my head that I know I want to spend more time figuring out in 2014 — these ideas are half-baked but I don't want to forget them.

One idea I'm grappling with is the relationship between data and narrative and how to communicate uncertainty.

Consider the following ways of talking about a burger:

  • That hamburger has 550 calories.
  • That hamburger has between 400 and 700 calories.
  • That hamburger is bad for you.
  • I ate that hamburger every day for a month and gained a lot of weight and felt crappy.
  • That hamburger has 550 calories and to burn it off you'll have to walk for 153 minutes.
  • That hamburger has 10g of saturated fat. Do you want to give yourself a heart attack?

I'm happy that data journalism looks poised for ever higher popularity in 2014 — our society is plagued with innumeracy.

But I'm also somewhat terrified.

The Certainty of Numbers

With narrative story telling there are built in affordances for uncertainty and ambiguity — it's one story that's representative of the whole. Telling the same story with data may be more accurate or better at capturing the whole, but it doesn't take away ambiguity and uncertainty — it just hides it.

Don't get me wrong, I'm not against data. Data does help us better understand the world, solve problems and predict what will happen.

But the data-mindset is also a very reductionist mindset that glosses over complexity by shoving it into a black box or down into the footnotes. Including numbers or using data to make a decision gives people a sense of certainty and false confidence. Like in a fantasy TV show where "magic!" can be invoked as a catch-all explanation or solution to any problem, "numbers!" can be an argument-proof explanation for any decision. "The data say so," can be a quick way to shut down disagreement.

Because narratives have ambiguity about how they describe the world they allow for uncertainty. As anyone who has made even the simplest attempt at data analysis outside of the confines of a statistics textbook knows, data about the world also has ambiguity. But when cleaned up and presented it no longer shows any trace of that messy original.

This is less a problem for those of us who actually get our hands dirty cleaning a data set or tuning a model than for those who have no understanding of that and just make decisions off of the conclusions.

There are about a million ways the data in a seemingly simple statement like "user engagement is higher with design feature X" could be wrong, but without actually doing the data analysis it's impossible to have an intuitive understanding of what those ways might be.

While not using data to make decisions might be bad, what's definitely much worse is using the wrong data to make decisions. Because then in addition to being wrong, you have the sense of false certainty from having used data.

Is there a way to change that?

Margins of Error

Confidence Intervals If so, it feels related to how the margin of error of a data point is communicated.

The human mind actually seems very good at instinctively dealing with uncertainty. When we ponder a decision we imagine all the possible future narratives that might result: ranging from wildly successful to humiliating defeat. We can look at something and have a good sense of if we'll be able to jump over it or fall in.

But too often in journalism that margin of error is simply not communicated at all.

CBO Report

Really changes the story doesn't it?

Measurement Error and Measurement Bias

Lately, I've been spending a lot of time wearing different activity trackers like the Fitbit, Jawbone Up, Nike Fuelband, etc. They all purport to tell me how many calories I've burned throughout the day. What none of them do is bother to communicate at all that the number they show me might be wrong. Every day it's some number with four significant figures and no hint that it's an estimate or what some of the measurement biases might be.

At the end of the day, my activity tracker might tell me that I walked 10,874 steps and burned 2,397 calories. But are you really so sure that it wasn't 10,875 steps? Or 10,876? And what's a step anyways? And does that calorie count include the set of pushups I did since my wrist was stationary? So that number must be low.

The alternative narrative description of my day might be waking up in the morning and doing a few sets of pushups, then walking to the subway. Some pacing around at work and then walking to dinner and back home. It's described much less "precisely". But that sort of anecdotal description preserves the narrative uncertainty that keeps it a truer description. I think there might be something hardwired into the way humans think that lets us parse anecdotes into an understanding of the world that allows for uncertainty.

Narratives. Data. Uncertainty.

These are all related in some ways but I can't quite grasp it.

Increasingly I'm recognizing that a lot of the work that I've done in the past was deeply unsatisfying because it was oblivious to the narrative context of the data I was working with.

I'd love to hear more thinking about this from others.