G. Sawitzki StatLab Heidelberg Last revision: 2014-04-19 by gs

Bertin Matrices: Introduction

What are Bertin Matrices ?

Among the rich material on graphical presentation of information, in La Graphique et le Traitement Graphique de l'Information (1977), engl. Graphics and Graphic Information Processing (1981), J. Bertin discusses the presentation of data matrices, with a particular view to seriation. A Tribute to J. Bertin's Graphical Data Analysis, presented at the SoftStat Conference '97, gives an appraisal of this aspect of J. Bertin's work.
This is a web version of the SoftStat '97 presentation, recorded in 2014.

The methods discussed in SoftStat '97 have been implemented in the Voyager system. Voyager now is part of Oberon and its descendants, and comes bundled with the current Oberon system from ETHZ.
The methods have been partially re-implemented in R. The R-implementation can be downloaded as a package bertin from http://bertin.r-forge.r-project.org/.

J. Bertin uses a small data set on hotel occupancy data to illustrate his ideas, and tells a story: a hotel director finds a way to represent his occupancy data, rearranges it, and of course becomes successful.


In abstract terms, a Bertin matrix is a matrix of displays. Bertin matrices allow rearrangements to transform an initial matrix to a more homogeneous structure. The rearrangements are row or column permutations, and groupings of rows or columns.

Hotel occupancy data
To fix ideas, think of a data matrix, variable by case, with real valued variables. For each variable, draw a bar chart of variable value by case. Highlight all bars representing a value above some sample threshold for that variable.

Variables are collected in a matrix to display the complete data set. By convention, J. Bertin shows variables in rows and cases in columns. To make periodic structures more visible, the data may be repeated cyclically. So data for the 12 months of a year appear in 24 columns.

hotel data
Hotel data, repeated cyclically


As J. Bertin pointed out, the indexing used is arbitrary. You can rearrange rows and/or columns to reveal the information of interest. If you run a hotel, of course the percentage of hotel occupation and the duration of the visits are most interesting for you. Move these variables to the top of the display, and rearrange the other variables by similarity or dissimilarity to these target variables . Time points have a natural order. No rearrangement is used here in this example.


Variables need not enter at their face value; they can be transformed, or derived variable can be added. In the case of the hotel data, this has already been done in the original data set. For example, the guests have been classified in tourists and business, and both sum up to 100%. If we want, we can remove this redundant information. This may clean up the picture. But it may hide information. For example, tourists are "anti-cyclic" to the hotel occupation and just fill the gaps. Removing this variable because it is (1-business) would hide this point.


And More ...

folk song data data folk song data legend

Folk song vowel scheme

Bertin matrices are not restricted to this display. For example, J. Bertin gives a matrix representing vowel schemes of folk songs - you can easily spot yodellers.

The main point in common for all matrix representations is that when the arrangement of rows and columns is arbitrary, some permutation may help to reveal information.