Most dataviz best-practices fight one of those three enemies: what are they?

Martynas Jočys
9 min readJul 12, 2023

--

When you started learning data visualization, probably you wanted to know all the best practices and apply them to your daily work. But the problem is that there are so many of them and quite a few are still extensively debated — you can find maybe 1000 articles with “most important”, “essential” or “golden” rules of data visualization and their set will be unique in each of the articles.

All the best practices and rules started with some principle, for example, Edward Tufte insisted: “above all else show the data”, while Giorgia Lupi advocated for the data humanism. When there is a discussion about whether some practice is best or not — those underlying principles should be questioned first.

While strict rules are great for beginners the aim of this article and website is to explore the field of visual communication of data deeper. I believe, that without understanding the underlying principles, one could easily fall into some dogmatic trap like “X is always bad” which may not fit the intention of the visual itself.

Here, I will try to extract principles for data visualization from the underlying reasons why data visualization is even a thing. Then — extract what are the enemies that produce the most damage according to those principles. This way you don’t need to consider a checklist of “just always do like this” best practices, embrace your own creativity and capture the true shape of the stories that lie within the data. (pun intended)

First reason. First enemy.

The first reason is speed. Even when the data is small, presented in an Excel spreadsheet, it takes some time to make sense of it. Even more time it takes to understand general tendencies, find the largest values, spot the outliers.

Data visualization allows this to be done simply quicker compared to a table or even text (given reader knows how to read charts). Just try this yourself finding the top plant by each feature in the image below:

I don’t need to read every single cell, compare them in my mind, and come up with a conclusion. If I’m focused, I’m reading the chart swiftly.

…if I’m focused…

Here comes the first enemy — distraction. If I’m not focused — I read and understand slower. I might be not focused due to personal reasons — lack of sleep, because I’m checking the chart on a mobile device while commuting to work, or because I’m constantly distracted by a colleague who commuted to work only to have a full day of meetings on Teams…

But the visual itself might slow me down. Observe the image below — how much more time it takes to find the top plant due to overly figurative font, sub-optimal grouping, a hard-to-find legend with hard-to-distinguish colours, and finally, a cat jumping right into your screen with such clear and easy-to-read message that we can’t help but read it. Some would call some of those distractions Some would call distractions “ chartjunk “ which according to Wikipedia are “all visual elements in charts and graphs that are not necessary to comprehend the information (…), or that distract the viewer from this information”.

There is a fine line between what is a distraction and what serves as guidance. The cat in the image is definitely a distraction, but introducing more distinct colours so different plants could be compared more easily would not be one. Also, every piece of distraction affects individuals differently, thus we cannot say that some image or design feature is always a distraction — we can only guess its effect on most readers. Finally, a distraction in one visual is a guide in another — think of a visualization about cats — of course, there could be an image of a cat!

One might consider dumbing down data in order to achieve a greater speed of reading, but this has less to do with visuals. The speed we care about is how fast the reader comprehends the information we wanted them to see in the first place. So, if the data is complex, the story is complex — it will naturally take more time to comprehend it.

Here are some examples of best practices aimed at speed of comprehension:

  • Do not rotate labels — it is easier to read horizontal text.
  • Use fewer colours — this helps the reader to focus on what’s the most important.
  • Adjust visuals to the audience — less data literate prefer simple charts and simple interfaces, while experts might benefit from more complex solutions.
  • Use icons instead of text to guide readers where appropriate.

Think about how the reader will read the visual and whether your decision increases or decreases the speed of comprehension. Always aim to increase the speed of comprehension.

Second reason. Second enemy.

The second reason is pattern recognition. The human brain can recognize visual patterns incredibly well, but cannot do two other things: it cannot quickly get a pattern from a table of numbers and cannot get a pattern from reading descriptive statistics.

Statistician John Tukey has put it into a nice quote “The greatest value of a picture is when it forces us to notice what we never expected to see.”

But what if we see what ChatGPT also sometimes sees — hallucinations?

Here is the second enemy — confusion. If I’m confused I might read the information (albeit swiftly) in the wrong way — get the wrong impression of a trend, a distribution, or proportions between data points. I might be confused by natural reasons like poor lighting, drug abuse or not paying attention, but the visual might confuse on its own.

Just inverting an axis creates quite a different first impression of the data. Of course, we could decode the message by checking the numbers, they are provided after all, but how much mental effort this “correct” decoding requires?

There is a nice principle of “ self-sufficiency “ probably introduced first by Kaiser Fung in his blog. It asks if numbers and annotations are hidden — can we still get the trends, distributions and proportions correctly? If not, then the graphical shapes are confusing us — we might see one number, but the shape tells a different story or maybe does not tell any story at all.

Here are some examples of best practices utilizing our natural pattern recognition ability:

  • Start a bar chart from zero — this way readers will get the correct proportions.
  • Don’t use packed circles chart for comparing values. It’s just so difficult to compare sizes of circles.
  • Adjust visuals to the audience — less data literate might be confused by more complex or lesser-known chart types.
  • Don’t use the pie chart. In most cases, there is some other chart type which is easier to decode.

Apply the self-sufficecy test and see whether your data is correctly conveyed only through shapes and colours. Always aim to aid natural pattern recognition.

Most best practices are aimed at increasing the speed of comprehension and aiding pattern recognition, but there is one more enemy covering a wider scope of issues within data visualization.

Third reason. Third enemy.

Fun, the third reason why people visualize data is fun. It is just so much fun to consume a colourful map, a complex visual, or an aesthetically pleasing chart rather than read a text. A pie chart in a long analytical text naturally attracts our attention, because it just looks better than a body of text. It is so much fun to click on some filter or parameter and see how the shapes on the display change, especially if they are animated! Operating a dashboard invokes feelings of “being in control”, “modern analytics” and being “data-driven” and feels rewarding even without discovering any insight, but discovering such one elevates an analyst to the heavens! Of course, this level of “fun” is highly dependent on one’s data literacy, but even without the literacy — charts might simply look like smart nice images, even if not decodable. This might even be the reason why we are in the field of data visualization in the first place.

If you are like me and are genuinely interested in languages, you’ll find the below images equally interesting. But if you’re not — probably the meaningless AI-generated chart will be more fun than meaningless AI-generated text.

The enemy here is frustration. No, we are not fighting boredom with charts, the enemy is when someone opens a visual and gets frustrated by not being able to read them (confusing!), being distracted by unnecessary visual elements (distracting!), but also by not being able to navigate it if it’s interactive and get a predictable outcome (poor UX!), not being able to get what is the point of it, what’s the key message, what they should do about it (lack of purpose!), or just waiting forever for it to load (tech issues!).

Of course, a manager might be frustrated by a truthful chart which shows that his department is not performing well, or an anti-vaxxer might be frustrated by a chart challenging his beliefs, but these cases are less related to the visual, more to the information itself.

And while the first two reasons are probably essential in most types of data products, the fun is essential only in some. I work in the Business Intelligence field (aka dashboards), so, “the fun” is never asked for. And while there might be cases when fun is the main point -frustration is always an enemy. No one is looking at a dashboard because some financial text is more boring, users usually just don’t have a choice — so making the overall experience pleasant or not frustrating at least really helps to drive the adoption!

Frustrated users not only will stop using the dashboard, but they might also stop trusting the information provided there: “How can I trust the data, when they cannot make a decent interface?”

Here are some examples of best practices aimed at having a pleasant experience:

  • Know the purpose of a visual — don’t show the information just because it’s available.
  • Consider the impact of the visual — what action should the reader take after seeing it?
  • Test the visual with end users — you might not know how creatively they will use your product.
  • Invest time in building a proper data model — this way the visual will load faster.

When making a decision about a visual think whether it will add positive experience or add frustration. Always try to make the experience more positive.

Feel free!

No data visualization best practice needs to be followed all the time.

Sometimes we need to spark interest, draw attention to some problem, or even allow the reader to spend time exploring a data-dense infographic. In such cases initial distraction might even be introduced for purposes of storytelling or igniting curiosity.

Rather often accuracy is a mere illusion when data itself has large errors — depicting uncertainty in such cases should be the bigger concern than showing some ambiguous numbers as accurately as possible. Introducing confusion might serve as a way of depicting uncertainty.

And maybe sometimes we just want to annoy some anti-vaxxer which a chart.

Anyway here are the three reasons why we use data visualization instead of just staring at tables or sequences of numbers:

  1. A greater speed at which we comprehend the data.
  2. Our natural ability to recognize patterns works only with visuals.
  3. It is more fun to look at visuals.

The enemies of good data visuals are:

  1. Distraction reduces the speed of understanding data.
  2. Confusion misleads our pattern recognition ability.
  3. Frustration undermines all the benefits visuals might have.

May these 3 reasons and enemies guide your decisions and help you create better data products!

Originally published at https://chartplanet.net on July 12, 2023.

--

--

Martynas Jočys
Martynas Jočys

Written by Martynas Jočys

You can talk with me about economics, finance, philosophy, music, travel, meditation and charts. Visit me on chartplanet.net

No responses yet