The Unbearable Lightness of Visualisations

A guest post by Juha Pajula from VTT

In the current data driven world, increasing amounts of data are generated by analyzing already existing data. This newly generated data is then again analyzed further to get higher level results on the analytics system, which are then presented to a targeted audience. If the result is something more complicated than a single number, or a small set of values, it may be challenging to interpret the results of the analytics. Throughout the ages, we have solved this issue by illustrating the numerical results as figures. One of our mightiest sense is the human ability to interpret changes in visual clues. Typically, combinations of numbers are hard to interpret if they are not visualized with some graphical method. The method can be a simple line or set of bar graphs to show the change between two or more data points, or variations over time or between measurement sessions. For example, it can be challenging to see with a peek if 0.000000001 is bigger than 0.0000000005, but if they are drawn on graph the difference is clear. Clever visualization can make significant differences when interpreting results. The more complex the data is, the more carefully the used visualization methods should be selected.

With well-designed graphs, the results can be made good looking but the results are still only as good as the numbers produced by analytics. For this reason, all results, either numbers or graphs should be explained, to the end-user. It is not enough to merely say that we have a histogram of the variable, as it does not tell too much to the end-user. In addition to the challenge to understand the results, some visualizations are also combined commonly with specific analytics. So, often many people consider the name of analytics as synonym for the visualization type. For example, “correlation matrix” is one of these. Correlation matrix is not a visualization type, it is typically a symmetric matrix of correlation values over variables, which are compared by correlation analysis. The actual visualization which is typically used is heatmap. Even heatmap can and is used to visualize various other results than correlation matrix, people commonly misinterpret the figure as correlation analysis. And, heatmap is not the only option to visualize correlations matrix, another common option is for example Chord diagram.

Another example of simple visualization and analytics to be misinterpreted is the histogram analysis. Often histograms are pointing to the bar chart representation of histogram analysis results. Similarly to correlation matrix, the results of histogram analysis can be again visualized in various ways. The bar chart is just the most used method and thus commonly mixed with histograms as a name. Same information content as histogram bar chart includes, can be shown also with line graph. In simplified concept from visualization point of view histogram results presents counts in bins for ordered sections of data. This kind of results have location on x-axis and height on y-axis and thus can be visualized as well with bars, line or even with dots on X-Y coordinates. See the example figure below. Quite often switching the traditional analytics results to another visualization type can make a difference to interpretability of the results. For example, would it be easier to compare two histograms with two lines in the same figure instead of two separate bar charts? 

Histogram analysis of random data having 100 points between 0 and 1. Histogram was calculated over 20 equally distributed bins. Graph includes three visualization types from the same data: Bar chart (grey), line graph (blue) and scatterplot (red).

Current trend of visualisations in the data filled online world is the interactivity of the results. See for example The Pudding publications. Thanks to the rising popularity of on-line media and contents, the general public has started to expect graphs to be interactive in a way that highlighting parts of the graph will show details and meta-information of the data points. In more advanced systems, user can even focus on separate parts of the data by zooming and scaling. Within the latest systems, users can even select data points or sets of data to interact with the visualized results. This functionality is named commonly as cross-filtering. For example, user can select areas from the map to see the results of single area or combine result of multiple areas. These solutions enable the common user to explore the data and find the local or global highlights from the data. The cross-filtering methods can also benefit specialists to design more detailed analytics and explore the data.

In the MIDAS UI, we are developing a visualization framework to provide most common visualization options for analytics system over standardized API framework. In practice, the same API JSON structure can serve multiple visualization types and the analytics developer can select the visualization, which the frontend of the system is using to draw the results. For example, the same histogram analytics result can be drawn either by bar chart or line chart. The JSON structure stays the same as in both cases there is needed the X and Y values for the graph. On the other hand, the UI won’t care what is the analytics applied behind the results as long as the resulting data follows the agreed format. The results can be calculated from aggregation or regression analysis over multiple variables. Similarly, a heatmap can visualize either correlation or dice index measures between data streams as long as results are delivered on similar JSON structure. From an UI point of view heatmap needs only a table of values to drawn with colors representing the values according to the color bar.

We also can help users to create new fancy visualizations; the biggest challenge is to make the readers of the created visualizations graphs to understand them. This typically means that we need to give an explanation about the analytics methods behind the graph results and describe what the content of the results may mean. If a user can select the data by her/himself also the data must be documented well and this documentation, metadata, should be available within visualizations. This leads to the true need: all data analytics and methods to visualize them should be as transparent as possible and they should also include their own meta-data. Then non-experienced user can create meaningful visualizations, which are interpretable. Black box analytics and non-explainable indicator values can produce results with high accuracy and have fancy graphs visualizing them, but the practical usability of these results can be close to zero. Proper meta-information about the whole pipeline from data and analytics including the description of the visualization technology itself is needed to make the end-user level visualizations interpretable.