Data Vizzing: the good, the bad and the ugly

A friend forwarded me an article he thought I would enjoy. Not because of the content, but because of the infographic.

This is an article in one of my favorite newspapers Het Parool. It is a Dutch newspaper from Amsterdam and this article talks about how more and more people in the city have expensive cars and bikes. The infographic tries to show the household wealth distribution within the city.

Speaking Dutch and being able to read the caption didn’t help me in reading the graph... In Dutch we say “Een plaatje zegt meer dan 1000 woorden” (a picture says more than a 1000 words), but in this case I think I would rather have had the words.

I love that I see more and more mainstream media using infographics and charts to support their storytelling. Not only does it enhance the article, but it also increases the data literacy of its readers because they are getting more familiar with seeing charts on a day to day basis. Also in this digital age, it is better to share an image with a link to the article, then just the text: a catchy headline with an infographic and a link is more inviting to click on then just the headline.

Clearly I am all for using data visualization in more areas than just the workplace. However, it sometimes feels that in everyone’s haste to jump on board they miss a few crucial steps.

A bad dataviz can confuse your audience or even misinform them. Sometimes this is done on purpose, to make the data fit the story they are trying to tell. But more often it is done because of lack of knowledge of proper data visualization.  “Can you add some colors to make it look nicer?” “This graph looks cool, we should use it!” “Can you remove those bits because they don’t fit well on the page?”

What is wrong?

So let’s look at the example and take it apart. What is wrong with it, and more importantly, why is it wrong?

Story telling - Let’s start with what it is trying to tell us. When I read the article, the infographic should tell us how the group of wealthy people keeps increasing in the city. But this graph is not showing a trend or any form of historical data. It is a snapshot of now, which means it cannot tell us growth. What it does show is the current situation: every square is a group of households with a specific capital. The bigger the square, the more households in the group. The amount of capital is written in the squares.

Size - It is hard to see which is the biggest. I know that in this so called hierarchy chart (a.k.a. Treemap) the top left corner always represents the biggest, and the bottom right corner the smallest, but I do not expect that the average newspaper reader will be aware of this viz type specifics. And even when you’ve figured out which is biggest; which is 3rd? Or middle?

Colors - The colors make no sense. There are greys, greens and reds. At first they looked random to me, but now that I have been looking at it a while I realize it is a Grey-Green-Red diverging palette, where grey is negative, green is positive but low, and red is positive and high. A legend would have saved me a lot of effort here. Also, why is it green – red diverging? Is it the median? The average national capital? Just random?

Part of the whole - The main reason for using a treemap is to show the contribution to a whole. What is one square’s impact on the total? How does one group compare to all the other groups? This viz type works best when there are a few dominant categories and a lot smaller ones. The small ones will be bunched up in the bottom right corner, while the big 2 or 3 take up most of the space and focus. In this example however, all the squares seem to be of equal size, so the viz isn’t telling us anything.

Groups - the household capital groups are a little strange. It starts with € -5000 of meer: minus 5000 or more, does that mean more capital or more minuses? After a little search I see that the next category is € -5000 - € 0, (minus 5000 until 0), so I think the first category was -5000 or less (which would have made a better name). The next categories are €0 - 1000, 1000–5000, 5000-10.000, 10.000-20.000. So we have one step of 1000, 2 steps of 5000 and one step of 10000. This seems like an odd way to group the data, and it also distorts the part to the whole comparison because the groups aren’t equal. The shown data is still correct, but it doesn’t highlight the gap at all. Another inconsistency is 1000 – 5000 vs 500.000 tot 1.000.000

And the positives?

Was it all wrong? No, not everything was bad. They did tick some important boxes: There is a title that tells us what the graph contains, there is a description on how to read the graph, and at the bottom they mention the datasource.

But these are must-haves and don't make it a good viz. I am struggling to find a better thing to say about it. I don’t hate the colors, but they do not contribute and only distract. The only real positive I can find is that it grabbed my attention and made me open up the article. And it's not a pie-chart.

.

Now that we have dissected the graph, let’s build it up again.

To tell the story that would actually support the article, we would need historical data too. How was capital distributed before? But since we don’t have this information, let’s see what we can do to improve the current graph.

Size - Our brains are a lot worse in comparing surface space than in comparing length. So in this case a bar chart would be better to show the biggest and smallest groups.

Colors - when you use colors, always show a legend! In this case I don’t think colors will help telling the story, since the capital of each household is already shown in labels. If we would use a bar chart, we could order it by size of capital and eliminate the need for colors completely. If we do want to use color, I would only highlight the negative income groups, or make a distinction between above and below national average.

Part to the whole and Groups - For part-of-the-whole to work, we would have to tackle the group size and make them equal. I would probably go for groups of 10.000 and then show a histogram for the distribution. The length of the histogram bars would make it easy to see the largest groups. The place on the axis of the histogram would show clearly the distribution. Colors won’t be needed. Also, we could show historical data by adding another set of bars in the histogram, with a light grey color, to show as a reference. Or use an animation where we can see multiple years of data and the change of the distribution. However again, these adjustments would require a different dataset than the one that was used in this article. If we are keeping the same groups as above, the next best viz type would be a bar chart, ordered by capital groups.

Result - I was not able to find the sample dataset for the households in Amsterdam that was used for the article, so I have tried to eyeball it and reconstruct the dataset in order to create this new viz. I renamed the groups so that they are consistent and shorter. I found the national average number and added it as a color. I also added the percent of total as a label.

However, without the historical data, this graph doesn't tell us much of a story. I decided to make up some numbers for the dataset I already created to get some context. The second graph (although built on fake data!) illustrates how important this context is to tell the story and also answer the question that was raised in the article.

What do you think? Did it improve the original? What would you have done differently? Let me know!

The takeaway is: data vizzing can be good, bad and ugly. But an ugly data viz is still better than a bad data viz. Never choose looks over quality.

Source: https://www.parool.nl/amsterdam/goed-verdienende-woningbezitters-met-een-vanmoof-en-een-tesla-voor-de-deur-worden-steeds-dominanter-in-amsterdam~be12209f/

Also have a look at

How to visualise NPS
VIEW
KPI Driver Tree
VIEW