G. Sawitzki StatLab Heidelberg Last revision: 2014-04-19 by gs

Bertin Matrices: Transformation and Serialisation

As we said before:
A third step controls the actual placement of the graphical elements. With a matrix layout, it is specified by possible permutations of rows and of columns. This may be related to information used in the first two steps, but should be considered an independent step. A vector or row orders and of column orders is the critical information from this step. Various seriation methods apply. This is where J. Bertin's ideas about ``internal mobility'' as a characteristics of modern graphics come to action. The typical situation is to select scores and display attributes, and then search for optimal or good seriations. This is the alternative to using interactive rearrangement.

Formally, an approach may be to select some metrics for the distance, and maybe a scale for the variables. Since we have differing variables, we have a choice how to compare them. For real world data, missing and out of range data are common, and we have to take a decision how to handle them. Ranks and percentages are the prime candidates for unifying scales.



Of course scale matters. Some common transformations are provided on the fly. As required by the context, these are available by row or by column. It is a convenience to have these possibilities available on a button press, but there is nor real interactivity here. These actions can be easily handled by commands, and of course this facility is available as an option.

For real world application, missing data need special attention, and we offer some possibilities to handle these.

Actions: Serialisation


Finding a sequence, as for example sorting by correlation to some target variable, is a sizable problem, an can be covered in n log n scale.
Finding an optimal sequence minimizing successive distances is calling for a traveling salesman. You should either be happy with an approximative solution (we are providing Reinelt's 3 swap algorithm), or restrict to small problems, or be prepared for a long waiting time.
Placing this step later allows to use information from score transformation and attributes, which may allow more efficient algorithms. In the end, we may be better with a good solution which helps to solve the practical problem, instead of an optimal solution to a theoretical one. These may differ considerably.