G. Sawitzki StatLab Heidelberg Last revision: 2014-04-19 by gs
statlab > projects >  bertin >  matrix

Bertin Matrices: Matrix Implementation

Placing and Rescaling

Placing and rescaling

When you first generate a Bertin matrix for a data set, it is placed on the display at the current insertion point as a glyph with minimal size. You can resize it, or replace to your convenience. (Of course there is a possibility for automatic size adaptation.)

This is a common feature of Oberon: display elements are just glyphs. They can be static characters, or the can be dynamic elements.

Seen from the Oberon point of view, a Bertin matrix is just a strange character, interaction taken for granted. For a general discussion of this feature in a statistical context, see the video Integrated Documents.

So you find a command to load your data, mark the insertion point where you want it, and execute the load command. If you want to, you do an adjustment as you prefer.

Display Adjustment

Display adjustment
Data sets rarely come in the format you want. By convention, J. Bertin uses variables in rows. In biometry, variables in columns are common. Independent of this, graphical representation may be by row or by column. The Bertin implementation allows easy transitions.

You can either chose a transformation when you read in the data. Or you can use a command, either as a textual command anywhere in the display space, or in a menu, or in a button or some gadget.
In our implementation, you can do the adjustment interactively. The segments in the upper right corner of the Bertin matrix display allow you to switch between these options, or transpose the matrix.

These input facilities (command, imbedded command, interactive) are available for most of the actions for Bertin. We do not mention them further on and take them for granted.
We should focus on the abstract actions and their intent. Implementation is a different question.

For our purpose, the hotel data come in the format we want to use, so no adjustment is needed here.

Rescaling and Transformation

Data come in various scales or encoding. For visualisation, we have to find a common ground. By default, the minimum and maximum of each variable is used to define a scale.
Some of the common scale adjustments and transformations are provided by Bertin, but of course these cannot be complete, and some ad hoc transformations may be necessary.
For our example, the default min/max scaling will do.

In general, in a first step, we transfer the input data to allow for common, or comparable scales. Missing values and exceptions may need special treatment.
In the Hotel example, Bertin rescales by the maximum value of each variable. The dichotomous variable Faires is encoded as 0/1. Our implementation default is to rescale for (0, max) for positive variables, (min, 0) for negative variables, (min, max) for general variables. Out preferred, or recommended rescaling however is to use ranks. We use the term score for the rescaled variables. Orientation of the data set is critical convention in this step. Usually, rescaling should be by variable, not by case. Depending on the orientation, this can lead for example to ranks by row or by column. We allow global scaling as an additional option for those situations where all data are already on a common scale. Following Bertin, our implementation default is to expect variables in rows, but we provide the means to switch to the R convention with variables in columns. The raw data may come in data frames, or lists, or views on a data base, and the original convention should be preserved. The scores however are a matrix, or an array (which we consider a stacked list of matrices in our context.) We prefer to keep these in Bertin conventions, that is variables are in rows.


Color and height attributes

Now we assume that the data have been transformed to some score, allowing comparison between variable that have different scales.

In a second step, the scores are translated to visualisation attributes. The same data may give very differing impression when rendered with different attributes.
And this is the point where it may be helpful to recall that different observers may have different perceptions.
In print media time, and before, the presentation needed to be frozen. Now we can provide the chance that the presentation is adapted to the user.

In the Voyager implementation of Bertin, we focus on two attribute classes: colour and geometry (e.g. use bar height or not).

Colour is handled in two steps. The scores are translated to a colour index, which is used together with a colour palette to determine the display colour for a data element. This allows rapid experiments with various colour palettes, as long as the length of the palettes are compatible. We strongly recommend to always look at the inverted colour palette together with a chosen one to mitigate the effects of colour perception. Simple image displays limit the visualisation attributes to colour. rect() in R for example allows rectangle geometry, colour, and border width. Shading and line types should be considered as an alternative for print media.

Visualisation attributes may reflect different aspects. So for example in the classical Bertin display, height of a rectangle is used to reflect the value of a data element, colour is used to show an indicator whether the value is above or below variable mean.

True interactivity is not needed here, but a timely feedback is helpful to allow experimental exploration of these facilities.