G. Sawitzki      > Data Analysis 2011/12      StatLab Heidelberg   Last revision: 2011-11-29 by gs
StatLab Heidelberg  >  studinfo  > da2011

G. Sawitzki: Statistical Data Analysis 2011/2012



If you are preparing for an assigned task, please heed the seminar tips.

For the 2011/2012 course, you can provide your results in one of one of three forms:

For copyright reasons, access to some course material may be restriced. These links are shown in this style.



Projection Pursuit


Tukey & Prim 9 Keywords

  • Picturing
  • Rotation
  • Isolation
  • Masking
(John Tukey, 1973) PRIM 9

Literature

Peter J. Huber: Projection Pursuit. The Annals of Statistics, Vol. 13, No. 2. (Jun., 1985), pp. 435-475
G. P. Nason: Exploratory Projection Pursuit
Mathew Ward: Projection Pursuit: A Brief Overview

See also:

http://www.public.iastate.edu/~dicook/JSS/paper/paper.html

Data Sets

PRIM
See Section 5.2 in (Cook and Swayne, 2007). Provided as example in ggobi.
Data Comments
Color perception
Documentation Data
Chemical Diabetes
Literature
Mineral water
See Section 7.3 in (Cook and Swayne, 2007), and R package classifly.
See also http://www.mineralwaters.org/ for comparisons and detailed analysis.

Brushing


M. A. Fisherkeller; Orion Keywords
  • Linking,
  • Scatterplot brushing
(John A. McDonald, 1980+) ORION

Data Sets

Boston housing data
original at UCI Machine Learning Repository.
UNT version.
Harrell version.
Cars
Henderson and Velleman. (1981). Building Regression Models Interactively Biometrics 37  400.
Software: DataDesk (demo version)


Smoothing and Kernel Density Estimation

Data Sets

R: data(faithful)
Old Faithful Geyser Data
A look at some data on the Old Faithful geyser
A. Azzalini and A. W. Bowman
Applied Statistics  39  357--365  (1990)

Literature

A Brief Survey of Bandwidth Selection for Density Estimation
M. C. Jones and J. S. Marron and S. J. Sheather
Journal of the American Statistical Association  91  401--407  (1996)


Bandwidth Selection in Kernel Density Estimation: A Review
B. Turlach


Principal Component Analysis

Data

R:
library("UsingR")
data(fat)

Literature

mkb92ma: Chapter 8.3
Multivariate Analysis
K. V. Mardia and J.T.Kent and J.M.Bibby
      (1979)

Branden2005Robust-classifi
Robust classification in high dimensions based on the SIMCA method
K. Branden and M. Hubert
Chemometrics and Intelligent Laboratory Systems  79  10--21  (2005)

In this paper we first investigate the robustness of the SIMCA method for classifying high-dimensional observations. It turns out that both stages of the algorithm, the estimation of principal components and the construction of a classification rule, can be highly disturbed by the presence of outliers. Therefore we propose a robust procedure RSIMCA which is based on a robust Principal Component Analysis method for high-dimensional data (ROBPCA). Various simulations and real examples reveal the robustness of our approach. (c) 2005 Elsevier B.V. All rights reserved.


One-Dimensional Diagnostics

Literature

gs94oned

Diagnostic Plots for One-Dimensional Data
G. Sawitzki
in: P. Dirschedl, R. Ostermann (eds.) Computational Statistics. Papers Collected on the Occasion of the 25th Conference on Statistical Computing at Schloss Reisensburg. Physica-Verlag, Heidelberg 1994. pp. 237--258  (1994)
Software and more information: http://www.statlab.uni-heidelberg.de/projects/onedim/.


In preparation

Dimension Reduction

Literature

Li1991Sliced-Inverse-

Sliced Inverse Regression for Dimension Reduction
K.-C. Li
Journal of the American Statistical Association  86  316-327  (1991)


Resampling

Classification and Regression Trees, DART

Literature

Breiman1984CART

Classification and Regression Trees
R. O. L. Breiman, J. Friedman and C. Stone
      (1984)

593439

J. H. Friedman (Aug. 1996a)
"Local Learning Based on Recursive Covering"
(software)


See also...

Courses to look at

Andreas Buja (University of Pennsyvania): Lectures on statistics and data analysis, Columbia University 2009
Heike Hoffmann et al. (Iowa State University): Visualizing Quantitative Information
Ross Ihaka et al. (Auckland): Computational Data Analysis and Graphics
Hadley Wickham (Rice University): Data Visualisation



Data

D. F. Andrews and A. M. Herzberg: Data
    XX, 442 S.  (Springer 1985) Data sets online
D. Cook and D. F. Swayne: Interactive and Dynamic Graphics for Data Analysis
    (Springer 2007), Data Descriptions (Feb 2007, PDF, 1.5Mb), Data: See Data section of the book home page.

See also Data.

to top of page


Literature

(Belsley et al. 1980)
Belsley, Kuh & Welsch, Regression Diagnostics, Wiley, 1980.
(Cook and Swayne, 2007)
D. Cook and D. F. Swayne: Interactive and Dynamic Graphics for Data Analysis
    (Springer 2007), Data Descriptions (Feb 2007, PDF, 1.5Mb)

to top of page



Software

DataDesk
http://www.datadesk.com/
ggobi
www.ggobi.org
R
www.cran.r-project.org


to top of page


$Source: /u/math/sa3/cvswww/www/www.statlab.uni-heidelberg.de/studinfo/da2011/index.html,v $
$Revision: 1.15 $
$Date: 2011/11/29 18:47:00 $