Data Visualization (for beginners): Color
- 1 Introduction
- 2 Best data visualization concepts. Creating a color palette
- 3 Using the color palettes
- 4 What if we break the rules?
- 5 Conclusion
Color theory is such a vast field of study that it was inevitable to bring it into the field of data visualization.
Many companies are also challenged to apply their brand colors to their data presentation. The problem is that this aspect was not usually thought of when writing a style guide.
It is easy to understand the underlying problem. Most users did not feel connected or rooted when viewing presentations outside the brand’s color palettes. And those that did follow the guidelines did not meet the primary differentiation or contrast needs for a correct understanding of them.
Fortunately, brands are beginning to consider data visualization in their corporate color palettes. Many develop extensive color ranges more or less based on their primary or secondary corporate colors. Even segmented by use or application (e.g., Repsol color palette).
Accessibility and brand presence in data visualization is enjoying more and more attention.
However, many other companies keep old style guides with a sparse color gamut and are very print-oriented.
In the latter case, we may need to represent a graph with a dozen variables to draw. But we only have two or three brand colors worth using.
Obviously, finding an effective range based on such a small contribution will be complicated. Even more so, we must make the information stand out from a certain distance and clearly differentiate each value.
Hiding behind small legends or labels to decipher the information is not an efficient or scalable solution either.
Filling certain areas with shapes, lines, dots, etc. is a common aesthetic choice when data is displayed on monochrome media. But, although having as many weapons as possible gives us more room to solve problems, it would not be the first option to use.
But the pitfalls do not end there. More relevant aspects influence the selection of a color palette to visualize data. Surely you have already missed the mention of the symbolism of colors.
Beyond the personal meaning that each of us gives to the colors, the consensus about the symbolism of some of them is evident. We are not going to go into detail on this matter now, but it is clear that we must reserve a space for the so-called “semantic colors.” Red, amber, and green have their place in all design systems and also their role in data representation because their use has been universally agreed upon in a certain way. Red represents caution, danger, or negative information. Green for positive values, growth, or validation.
Aspects such as the area’s shape, position, or size affect how the colors relate to each other. Thus, we find that well-contrasted colors in a pie chart may not work in another whose values are represented by closely spaced lines.
Certain colors have a symbolism that has been socially and culturally bestowed on them and that must be taken into account when creating a color palette.
It is also not the same to visualize the colors on a light background as on a dark one, where we will have to use a more contrasting and bright palette.
And, of course, we cannot ignore people with visual difficulties. Fortunately, tools exist to test color schemes for compatibility with various forms of color blindness.
Some references that may be useful: Colorblind web page filter; Chroma.js color palette helper
It is clear that we are facing a complex situation with many variables and almost infinite combinatorics.
We will try to shed some light and, looking at how large companies have dealt with this problem, give some good practice tips to select the most appropriate color palette for our data visualization.
Best data visualization concepts. Creating a color palette
We have already briefly commented that not all companies have a specific palette for data visualization. We have talked about how important it is to have an effective range of colors that provide a sufficient contrast ratio and maintain a close relationship with the brand image.
Posts to ask, what if we also have semantic, neutral, sequential, and categorized colors. Well, we have come too far. But since we roll up our sleeves, let’s do it right. Let’s be organized.
Getting serious about creating multiple color palettes to visualize data and anticipating its scalability is the best starting point.
Generally, we are going to find four uses of color in graphics depending on the audience and the story we want to tell:
- Categorical colors help users assign non-numeric meaning to objects in a visualization. They are designed to be visually distinct from each other.
- Sequential colors have numerical meanings. It is a gradation of colors that go from light to dark, the latter being the most valuable.
- The divergent colors also have a numerical meaning. They are helpful when dealing with negative values or ranges with two extremes and a baseline in the middle.
- Accent or highlight colors are combined with neutral colors, such as grays, to highlight, especially, relevant information.
So we should at least create a color palette for the apps listed above.
And it is not an irrelevant task that we can delegate to inexperienced users. When we consider the possible personalization of a digital product, we must decide how much weight this functionality has for the user. If the relevance percentage is lower, we should limit that customization as much as possible to control the final result. And if it is the other way around, we must be aware that the product’s image will be diminished and accept it.
Selecting the right color palettes is a complex task that requires experience and specialization.
You have to anticipate what degree of color customization you are going to allow the user, if any, and control it.
Selection of color
Before we begin, we must take a few factors into account:
- Support. Choosing colors for digital devices such as smartphones or tablets is not the same as choosing colors to be displayed on the television or in printed media.
- Accessibility. Identifying potential contrast and perception issues for people with visual impairments helps us significantly reduce the impact of an ineffective color palette.
- Semantics. Each cultural environment has its own perceptions of the symbolism of colors. For example, in Western cultures, the color green means good luck, among other things. However, in the East, the color red assumes that role.
- Audience. Knowing in advance who will visualize the data and what colors are primarily used in your sector guides our selection vector. For example, for a tourism-oriented company, a cheerful and bright palette suits us. Nothing to do with a funeral home.
- History. This ties in with the previously mentioned color application types (categorical, sequential colors, etc.). Also, if we have several color ranges, we can apply one or the other depending on the story we want to tell. For example, if the objective of the graph is to reflect the return on investment of the R&D department, we would still be interested in giving the graph a palette with cheerful colors that tell a positive and optimistic story. Maybe that way we can get a bigger budget for next year.
Techniques for creating palettes
Let’s imagine that we have a limited wardrobe of colors. Let’s expand it a bit. There are various techniques for this.
- Starting from the most representative brand colors, progressively decrease the saturation and darken or lighten the tones. Cons: Produces a dull, grayish palette where colors can easily blend together.
- Use tertiary colors or colors used by the company in other supports and create a wide range of random colors. Cons: The bias towards the company’s most representative colors are accentuated, and some inconsistency may appear.
- Create a more limited palette of colors and other simpler sub-palettes derived from it with more controlled contrast for basic graphics. Cons: We may still need specific color gradations for certain types of charts.
This last option is the most recommended due to its balance between the first two. One trick to doing this is to select a few original brand colors from among the primary and secondary colors. Duplicate them. Place the clones alternately mixed with the originals. Finally, modify the hue, saturation, and lightness to create enough contrast.
If we really have little material to work with and are allowed to get more creative, we can use a few more tricks to create our palette.
- Use the color palettes that nature gives us. If a gradient is to be used for a sequential or divergent color palette, it is best to use colors that appear commonly associated with nature at the extremes. Like, for example, at sunset. Otherwise, we will notice that the color transition is artificial and unrealistic and will cause us rejection.
- Use color schemes from photographs related to company values. If you have the help of an expert, you can select the colors directly. If not, it is better to use software that decomposes the image to extract the predominant colors and make our palette from there.
Of course, selecting colors for data visualization is a very personal and subjective task. But there are many conditions, as we have seen, and variables to consider if we want to obtain an effective result.
Regardless of the branding of our company or our personal tastes, some palettes have been revealed as practical and effective and that we should know about. Particularly the sequential ones because of the scientific basis behind them.
Introduction to the Viridis color maps.
These sequential palettes have been prepared to take into account, among others, contrast and perception percentage variables to prevent accessibility problems for people with visual impairment caused by color blindness or its variations.
In this way, anyone will be able to perceive the same variations of tone and contrast in the sequence of values of a data presentation. Even if it is printed in black and white.
These color palettes were initially developed to be used in a programming language called R, created especially for academic and research purposes.
R is a free programming language for statistical computing and graphics. It is widely used by researchers from various disciplines to estimate and display results and by professors of statistics and research methods.
These color scales have been created and refined by highly qualified research teams, and such is their value that even companies like Adobe have surrendered to their benefits by developing their own color schemes for data presentation based on them.
Finally, a variable to consider, and that we have already mentioned is the base color: the theme.
If we are going to present the data on two themes, one light, and one dark, we will have to create two palettes adapted to each background.
Very vibrant or dark colors can be fatiguing on a light background, and the application’s appearance will look dull. Less saturated and well-contrasted colors are recommended. But in a dark environment, we must add a plus of brightness to our palette without saturating the tones too much.
Does this mean that we should make two different palettes? If we want to be consistent, the answer is no. But we must adapt some of the colors according to their context.
Using the color palettes
Let’s assume we already have our color palettes for displaying data. That we understand the variables of perceptual differentiation, accessibility, and symbolism.
We must also assume that if our palette is broad, we will find imperfections. That the contrast ratio with the background, between adjacent colors, and with the text is not always going to be optimal.
This is why we only have 50% of the problem solved. Knowing how to use colors is as important as selecting them. We will see some keys.
Having a hammer does not mean that everything is nails. Just as important as knowing how to use colors is knowing when NOT to use them.
It is important to remember that color needs a purpose. If the information can be understood without it, it is better not to add it. In general, if your visualization only contains two data dimensions, such as the evolution of a value over time, you only need one color.
It is not necessary to complicate the quick understanding of a graph by adding colors just to achieve a more pleasing aesthetic effect.
Basically, color palettes are applied to two types of data representations:
- Qualitative: where a logical, numerical order cannot be applied.
- Quantitative: the numerical order has a relevance that must be represented.
We are talking about data of different natures whose value does not represent a correlation with the other data and, therefore, must be distinguished.
In these cases, we apply categorical color palettes. Namely:
- Well-differentiated colors without marked symbolism.
- With a similar brightness and saturation.
- With a sufficiently different tone not to generate visual groupings or a sense of order.
It is recommended not to use palettes of more than 6 colors. It has been shown that people cannot reliably distinguish more than 5–8 colors at the same time.
We are talking about data with the same nature, but that differ in their value: for example, the population density of a country.
There are quantitative graphs that can be represented in two different ways:
- Sequentially. A color gradient reflects when the values are low or high.
Do two-color gradations whenever you can. It provides more excellent contrast and differentiation than using a single color.
The luminosity of the extremes should clearly show order and which values are more extensive and which are smaller.
The difference between color steps should be proportional to determine how far away two values are from each other from any point on the palette. 5 steps is a good number to start with.
- Divergently. Two color gradients have been used that meet in the middle, represented by a neutral tone. It is used in cases with ranges of values within a standard scale. Imagine a thermometer or a satisfaction survey.
Put in the middle a white or gray color, even a pale yellow can work, as your mid-range of neutral tones.
As in the sequential palette, the brightness of the extremes must clearly show an order. Which values are larger and which are smaller.
Also, like in the sequential palette, the difference between color steps should be proportional to determine how far away two values are from each other from any point on the palette. Between 3 and 5 steps is a good number to start with.
What if we break the rules?
You will see below that from time to time, we may break some rules to achieve a more effective presentation of our story.
We have seen so far that there are sequential, categorical, and divergent color palettes. But as we already mentioned, we can create accent palettes to highlight data.
It’s a common way to break certain rules to get a different visual impact to play with.
It is evident that on many occasions, the use of color is arbitrary because it is not applied following the defined color palettes if they exist, or they are not applied by experts, and personal tastes are followed.
In other situations, we find it impossible to highlight a specific value on a color palette that we have defined precisely so that they do not stand out from one another.
It is also complicated that, over a wide range of specific colors for data visualization, we can apply the personality of our company in a way that we perceive the brand.
On extensive palettes or with non-predominant colors it is very difficult to highlight concepts or apply the company’s brand.
These situations present us with scenarios in which a wide range of colors may not be the most effective solution. Instead, having a range of grays can be a great ally. Some uses can be:
- Accentuate to focus. Gray combined with color makes it stand out and emphasizes its value.
- Create interpretation references. If a sequential gray gradient is created, black color can be used to highlight a value or progress, for example.
- See the general form. If the same shade of gray is used for the lines of a graph, we can easily intuit the general shape of a trend and detect outliers.
- Create references to highlight a position. Thanks to the contrast with a color, we can easily detect the position of an object or value.
- Gray can be used as an elegant resource to accentuate the brand color, making it stand out from the rest of the values.
- It is socially accepted that shades of gray also represent null, disabled, or unselected values.
We have talked about symbolic colors before. They are those to which connotations of some kind are socially attributed: mainly red, green, yellow, black, white, gray.
But what if our company colors match one of these symbolic colors? Obviously, we have a problem that we have learned to overcome.
- If your main brand color is red, you can use it to highlight outliers, highlights, or special features. You can use it in some headlines or graphic details in other report parts. If, for example, you are talking about the volume of car sales in a specific region of your continent, you can consider painting the rest of the regions in gray scales according to their sales. You’ll build a brand, and your graphics will tell the story effectively. If you need to represent negative values culturally associated with red, you can use other colors with similar symbolism, such as black. For positive values, the use of red is not recommended since most people perceive that color in the opposite direction. Classic green is a better and unequivocal option.
- If your brand’s color is mostly green, you will understand that it is not appropriate to represent negative values with that color. So it should not be used to emphasize these types of graphs. On the other hand, for positive values such as earnings, growth, etc., we do not find obstacles to associate with your brand. Nor to emphasize values with less symbolic load, such as the number of workers in their departments.
Some extra tricks
I think that at this point, we are all clear that the fundamental thing in data visualization is that it tells us a story in a clear and concise way. No distractions or superfluous embellishments.
We have learned to create a suitable color palette for its correct presentation, and we have discovered how to use it effectively.
And yet we can raise the bar a little higher with a few tricks. For example:
- Accessible axes. Rectangular charts, such as heat maps, should include 3:1 contrast accessible x and y axes to help define the chart boundaries. In this way, we ensure the accessibility of all contiguous colors vertically and horizontally.
- Active contours. On maps, it is convenient to delimit active or highlighted regions with a colored border accessible at least 3:1 against the background. Especially if we use sequential palettes. We must use all the tools at our disposal to highlight the desired values, and often, the contours are greatly forgotten.
- Contrast edges. When a sequential palette is used in certain graphs, it is possible that, depending on the background color, the tones at the ends of our palette are at the limits of readability. In those cases, we can add a border in an accessible tone to help reading. It usually occurs in graphs with colored areas, such as bar graphs.
- Limits or borders. For visualizations with a contained data density, it is sometimes convenient to delimit the areas with an accessible color border (usually the background or another neutral color so that it does not have a semantic load). For example, for map borders.
- Dividing lines. It is really difficult to make the 3:1 accessibility that we have achieved against the background maintained by combining some colors with others. The dividing lines facilitate this task in both categorical and sequential palettes.
- Textures. A classic way of identifying categories without resorting to color textures. They are helpful when you do not have access to color printers or when you want to have a specific effect on the user. It is convenient to delimit the areas with contours and make them wide enough so that the texture can be clearly perceived and differentiated from the adjacent ones. The idea is to find categorical and sequential patterns that reflect the density of the value they represent. Although they can be seen next to solid color areas or even combined, we do not recommend this use. It complicates the experience, strains the eyes, requires a perceptual effort that slows down the message and understanding of the story, and can sometimes generate undesirable optical effects.
- Floating information. Whenever the graph requires it, pop-up descriptions can be incorporated. They give access to more detailed information that would otherwise be inaccessible or hidden. They provide interactivity and can even customize the report according to our needs using filters, for example. They can vary from the simplest to really complex floating panels.
- Data tables. They can almost be considered a way of improving the accessibility of data presentation. Obviously, they do not enjoy the immediacy and visual impact of the rest of the graphics, but in return, they provide neutrality that is difficult to manipulate and the possibility of moving only using the keyboard. The ability to view a graph as a data table should always be an option in the additional view menu.
The truth is that we could go on talking at length. The experience keeps in the backpack of each one advisable practice, mistakes to avoid, or personal tricks to achieve adequate data visualization. We’ve put together some science-based techniques here to get you started on minimizing errors. But as with almost everything related to design, in the end, each particular case must be studied and resolved individually.
By way of final advice, we could conclude that if we achieve a contained, clear, quickly understandable, and accessible presentation of data, we will have fulfilled our objective.