Bertin Matrices: Selections and Models
In the context of interactive Bertin matrices, the key concept is that of a selection. This may apply to variables, or cases, or both.
In the basic access, we select a variable, or case, and move it, thus defining a permutation. More generally, we may select any combination of cases and variables.
Selections may be moved as a whole, collapsed (hidden in the display) or replaced by a surrogate variable.
We may keep classical statistical models in our mind. In these models, selections have roles. In a regression context for example, variables may be regressors (the independent variables) or response (the dependent variables). In Bertin, this is supported by supplying two types of selection.
Visualising information is
but one aspect. In statistics, as we see it today, visualisation may be
one part of an analysis. The outcome will be a decision leading to an
action. Then there is a loss (or gain) depending on the action taken on
the one hand, and the "true" state of the world on the other. This is
the common decision theoretical setting. Statistics has formulated a few
standard problems, and given suggestions how to handle these. In our
Hotel example, the problem can be seen as a prediction problem: find a
prediction model to predict occupation and duration, based on the other
More specifically this is a control problem. The statistical contribution is to find a regression model for occupation and duration, based on the other variables. Some of these variables may allow an intervention, and this makes it a control problem.
The visualisation can be seen as one way to hint at a regression model. There are very few classical problems. Regression is one of them, and prediction is closely related. Classification and clustering is another, closely related pair of problems, and their relation to Bertin matrices should be obvious.
In a Bertin context, it is tempting to go beyond classical regression. As a proof of concept, we implemented a nearest neighbour smoothing in Bertin, where neighbourhoods are understood as cell neighbourhood in the matrix. Following the usual procedure in regression, this leads to a fit matrix and a residual matrix. Both are Bertin matrices linked to the original data matrix, that is dynamic linking is preserved.