# StatLab Heidelberg: A History

D. W. Müller (1971) |

StatLab Heidelberg was established 1984 as an initiative of D.W. Müller. It was inspired by impressions taken from the StatLab which had already been established in Berkeley by Jerzy Neyman. Other StatLabs did already exist, most notably the StatLab at Cambridge, going back to R. A. Fisher.

The statistical laboratory contributes to the educational efforts in statistics at Heidelberg. There was a gap to bridge between the statistical theory taught in a mathematical faculty, and the requirements arising when it was actually applied.

But there was another gap on the other track of the road. Statistics, as represented by D.W. Müller, was theoretical statistics. But to make good theory, D.W.M.'s position was: you must know which questions are asked in practical situations. It does not mean you have to respond to these questions. But you should at least be aware of them. So StatLab was intended as a possibility to attract problems and to understand the real-life requirements. For this purpose, StatLab offers consulting. Not as an obligation: not all requests would be handled, as in a consulting shop. There was a freedom select request as to whether they would provide usable examples for students, or interesting problems for research.

StatLab is affiliated to the Institute for Applied Mathematics, Universität Heidelberg. Like the Berkeley StatLab, it is a non-institution. It has only minimal staff position. No guaranteed budget. No official status. StatLab had to live on its own.

## Statistisches Praktikum

One of the first steps was to establish a statistical lab course, "Statistisches Praktikum", an extended seminar which did not only include mathematical theory, but also computational extensions. Four participants. They did not have an easy time. Two of them, Wolgang Polonik

"Statistisches Praktikum" was designed to run annually and has been running since then. We did not keep up the rate of producing 50% chair-holders.

At that time, the idea was that "real" calculations were to run on a main frame. But we recognised the need to have a more flexible device for graphical displays. A more flexible front end was all we were hoping for.

We went around to inspect what was available, and ended up with a decision for a device that just arrived at that time, Apple Macintosh. There was no development environment at that time. So we went for cross-development on Apple Lisa. We had to build up a statistical library from FORTRAN scratch at that time. Ulrich Mansmann, now in a chair as well, was one of the first ones to spend his energy on theses tasks.

But computing was only a side issue. The computing facilities allowed us to develop an experimental approach to new statistical theories and models.

## Kolloquium über Statistik 1971 – 2004

The Colloquium was already initiated about 1971 when D.W.Müller was in Frankfurt, and was extended to a Frankurt/Heidelberg cooperation when D.W.Müller moved to Heidelberg. Deliberately it was marked as a seminar, a meeting where seeds were seminated. Reception of new ideas and discussion was invited here. Lucien LeCam's "current literature" seminar was a model. The Colloquium started with a lecture by D.W.Müller on consistency of Bayes estimators. It was continued as a Frankfurt/Heidelberg collaboration, with Hermann Dinges being one of the main Frankfurt partners, until it terminated after more than 30 years, with a closing talk, again by D.W.Müller on consistency of Bayes estimators. To be continued…

## Statistics Days: Statistik-Tage in Heidelberg 1984 –

As stated, there was never a guaranteed budget for StatLab. Fortunately, in the beginning years, there was the special research unit SFB 123 "stochastic mathematical models" of the Deutsche Forschungsgemeinschaft. So there was funding for research, and StatLab could benefit from this neighbourhood.

Using these funds, on the research side, D.W.M. established the "Statistik-Tage in Heidelberg" as a meeting for quick exchange of new research ideas. The idea was to invite young international researchers to present their research ideas, and of course to collect stimuli for future research.

Ch. Stone started this series in 1984 with a lecture presenting the ideas of classification and regression trees (CART).

To be continued…

## Computational Statistics

There was never a decision to build up a focus on computational statistics. It just happened. John Tukey had made its point that there is an important place for exploratory data analysis. The PRIM system and its implementation by Peter Huber had demonstrated the potential of computing for statistics. The Bell Lab group have made this point explicit.

Resampling ideas had matured. These ideas had already been considered for a long time. Gosset had checked his ideas of the t-distribution using practical resampling. Now it was the time to develop theory and implementation in sync. Resampling, with bootstrap as a leading edge approach, was a common topic.

In next neighbourhood, medical statistics was moving to biometry, with a strong impetus towards computational statistics. At this time, we could calculate the needs for state-of-the computational resources for statistics. This was far ahead of what industry could supply. (It took until 2001 that we finally received workstations with the exact specifications we had laid down in 1990.) But we stated our requirements, which of course lead to a clash with ideas that were sticking to the past state of the art. At that time, we intended to install a workstation based solution, not another mainframe.

At that time, the world was small. There were some telephone calls, and Steve Jobs assigned Dan'l Levin of his team (then at NeXT) to discuss our needs. He did it in an open way. Apple (Steve Jobs competitor at that time) and the DFG were helpful on the other side, and we ended up with funds for a "bootstrap platform" without the obligation from the side of the DFG to give details of this (but of course with some non-disclosure agreement from the other side). This gave a technological base for some time, and we are grateful for the flexibility from all sides.

Theoretically, the computing resources of the group as a whole were sufficient. But most of it was just idle, while the power would be needed in other places. So we took a step at the "applied" side. The result was a distributed computing protocol and its implementation. We coined the name "NetWork" (1990). Today, it would be called "cloud computing".

The technical base for computational statistics has never been its own end. It was just the technical underpinning, allowing a different view on statistics, and different approaches than the expansions and asymptotics used in earlier years. R. Beran used the name "experimental statistics" for this. The excess mass approach, mentioned below, is an example where computational statistics can be seen in action.

## Design and Implementation of Data Analysis Systems 1985 – 1995

As far as theory was concerned, the statistics days were at the bottom on our heart, and our visitors next to it. But the developments in computational statistics, in particular the developments at Bell Labs, indicated what could been done.

Paul Velleman's Data Desk (1984) showed what could been done in a practical implementation, but did not allow the flexibility needed for developing new statistical ideas. On the other side, Bell Labs' "S" showed how a statistical system could look like, allowing for flexible experiments with new algorithms. There was a gap, and the commercial systems available at this time filled the slot reserved for dinosaurs.

This gap needed to be filled. Supported by DFG, 1985 we organised a workshop on "Design and Implementation of Data Analysis Systems". It developed in a small series. The third one of these (1995) is documented extensively in a series of articles that fills out most of Journal of Computational and Graphical Statistics Vol. 5 (1996).

One of the highlights of these workshops is Ross Ihaka's and Robert Gentleman's re-implementation of Bell Labs' "S", using ideas from Lisp: the R system made its first general appearance at the 1995 workshop. This was the public birth of the R system.

Our own contribution to this development, the Voyager system, has since then been integrated in Niklaus Wirth's Oberon operating system and its heritage. If you get one of these systems, you get Voyager for free.

A side branch of Voyager was the idea of integrated documents: integrating life statistical analysis in a document. This idea has been taken up, and one aspect of it is followed in the idea of reproducible research, as manifested by the Bioconductor project. To be continued…

## SoftStat Conferences 1985 – 1995

We happened to stumble into the SoftStat Conferences. The SoftStat conferences were a bi-annual event, bringing together applied statisticians mainly from the social sciences, with software developers, mainly those of systems used to handle surveys. The 1985 conferences was to take place in Mannheim a week after our workshop. Shortly before the conference the roof of the Mannheim conference building collapsed. Heidelberg University offered their campus facilities, and we agreed to extend our workshop to SoftStat, moving the workshop computing environment (some lorries at that time), our video facilities, and speakers and guests from our conference building to the campus site. It was an invasion. But as a consequence, the SoftStat conference was moved to Heidelberg for the years to come, organised by ZUMA (with F. Faulbaum as a main organisator), in cooperation with the working group "Statistical Data Analysis Systems" of the GmDS, and the working group "Computational Statistics" (at that time chaired by G. Sawitzki from StatLab Heidelberg) of the Biometrical Society (DR).

We did not intend it, but we were hosting a bi-annual conference with a stimulating 400+ audience. The series lasted until 1995.

## Beiträge zur Statistik 1992 – 2001

As the financial support of the DFG for SFB 123 was coming to an end, work had to restructured. Of course an obvious question was whether the quality and productivity would be maintained without this stable background support. To document the achievements of the local group, a lecture series was installed, hopefully showing a clear image. In a long discussion agreed on two components as a common design elements. One was a (linear) regression with confidence bands and asymptotic lines. And explicit formula. It contains all the basic elements of our work. Berkeley has half of it, using the claim "estimate, with confidence". We add a little bit to this.

The other is an excerpt from a letter of Gauss to Bessel, dated 1839. In Th.M.C., Gauss had introduced the maximum likelihood approach. Now he apologizes for this: maximizing likelihood mean maximizing an infinitesimal value, and still is an infinitesimal value. This is a metaphysics which should be abandoned. As an alternative, a loss integral should be considered. And, interestingly Gauss continues, that he would use square loss in his publications for the obvious mathematical convenience - in practice, he would use other loss functions. Other statisticians needed another 100 years or more to reach this point. And of course there is there is the rest of them.

The "Beträge" started in 1992 with a contribution by Rudy Beran: Stein Estimation in High Dimensions and the Bootstrap.

The Beiträge were continued until 2001. The series had made its point, and at 2001, archive servers and individual web presence was developed to a point where a local preprint server was not necessary any more.

64 papers haven been published in the "Beiträge", and a small number adding to this contributed as technical reports. StatLab always lived from discussion, and from contributions of many others. This is manifested in the Beiträge.

## Virtual University 1998 – 2001

In 1998 a project was started to explore the available electronic facilities for teaching. It was named "Viror", and the idea was to explore the possibilities to establish a "virtual university" covering the upper Rhine area, that is going from Mannheim/Heidelberg to Freiburg. We joined this initiative.

So we experimented with video recordings, life video transmission, and on possibilities to enhance the use of electronic media in teaching.

We developed "interactive" coursed covering the base points a a stochastic syllabus. I started with basic concepts in statistics, passing through regression, and on the other end Enno Mammen added a course on nonparametric regression and smoothing. These courses are still available, most from my home page.

We had to redefine "media". Our approach is laid down in a
paper, but
for a first
impression the best access is a

Teaching experience was very interesting: we could provide a very relaxed learning atmosphere, and teaching could concentrate on outlining the ideas, with drastic demonstrations, while all details were still prepared for access.

But the bottom line however was disappointing. To produce one lecture of 90 min in our quality would need about a month in preparation. And the "distributed teaching" approach just did not work - the material was not used outside on a relevant level.

So it was an interesting experience. It showed what can be done. But it did not pay. So we left Viror in 2001.

## The Excess Mass Approach

There is one underlying idea in the excess mass approach: estimate the total probability mass not explained by a given stochastic model. This approach may open a new look at statistical problems. D. W. Müller had given an early example of this idea in action: In a two-treatment comparison, the treatment effect can be judged judge by estimating the minimal proportion of the population to benefit from the new treatment. This problem has a practical solution, which is both usable as a test statistics, and as a description of the effect that has a statistical meaning.

Using this approach to investigate the modality of a distribution lead two what is now known as the excess mass test. Again, the test statistic is not some metaphysical infinitesimal, but a down to earth (that is statistical) quantity. Here the additional probability explained by a more complex model in comparison, for example the additional probability mass explained by a bimodal model in comparison to a unimodal.

The work on the excess mass approach has been an interesting interplay of theoretical and computational statistics. The excess mass approach has later been continued by Wolfgang Polonik.

## Confidence

Teaching statistics to mathematics students on a rigorous level was an educational effort by D. W. Müller. Based on the concept of "confidence nets", this has been documented in a manuscript "Confidence: A Mathematical Introduction to Basic Statistics". Unfortunately, this is left as an unfinished book project.

## The Heritage

We are greatful to all the students, visitors and cooperation partners over the years. And we are proud to those who have shared their efforts at StatLab. Education of young statisticians was the core of StatLab. Among the young researchers coming out of this group, (sorry for omissions)- Hans Rudolf Lerche
- Werner Ehm
- Enno Mammen
- Wolfgang Polonik
- Willy Sauermann
- Vangelis Ioannidis
- Martin Diller
- Felix Friedrich
- Lutz Dümbgen

## And now…

As the home page says:
When StatLab Heidelberg was established, it pioneered many services,
like email, online preprints, web calendars of events etc. Most of
these services are by now generally available. After 25 years, as of
August 2009, StatLab will concentrate on its core, and these additional
services will be moved to other sites. For a limited time, links will
still be provided at this site. Thank you for staying with StatLab for
the time past.