“When I looked at the 2017 numbers, I went, ‘Oh, my God,’” said David Hemenway, the director of the Harvard Injury Control Research Center. “You just can’t use those numbers.” (Original post here)

This post is a response to "Storytelling With Data's" April challenge.

Every month, SWD issues a data visualization challenge to push us to develop our skills and try out new things.

This month, the theme is "Emulate," that is, find a visualization that you like/appreciate and recreate it using your own tools.

I chose a graph produced by FiveThirtyEight which asks the question, "Why are the CDC's non-fatal injury estimates for guns so bad?" (They put it more nicely than that, but I think it's a fair description.)

Original graph as published in The Trace, March 11, 2019

I thought it was interesting because it focuses on the confidence intervals rather than the estimates themselves. Across a series of articles, FiveThirtyEight has pointed out that the 95% CI for 2017  (31,329 - 236,461) makes the estimate itself essentially useless. Even the Center for Disease Control describes an estimate with a CI this wide as “unstable and potentially unreliable.”

Fortunately for my exercise, the original data can be extracted from the CDC website Non-Fatal Injuries Database, so that was my first step. I imported this data using python and plotted it using MatPlotLib.

The biggest challenge for me was the custom formatting on the x- and y-axes. This was good; it pushed me to use the plt.FuncFormatter feature, and work with date formatting as well as adding conditional formats. (If you look closely, you will see that the formatting is different as you go along the x-axis. Whee!)

Here is the outcome of my exercise:

Thought process includes: "Hrm. You must be able to put text in an arbitrary place and then rotate it..."

One of the things that I thought was particularly interesting is that the weight of the line is heavier for the Uncertainty than it is for the Estimate. This draws your attention to the key point. (See what I did there?)

This was a really interesting process, especially for the experience of looking more carefully. I probably wouldn't have (consciously) noticed the line weight without trying to replicate it, for example.)

Code is here, if you are interested in playing with it.

I'm on twitter, and LinkedIn. You should totally talk to me. :)

(I'm also in the market for data jobs. Just putting it out there.)