Chemometrics and Statistics

Halfday meeting of DSTS (Danish Society of Theoretical Statistics)
The Royal Veterinary and Agricultural University
Thorvaldsensvej 40
1871 Frederiksberg C

Thursday 27. September 2001

12.00-13.00 Lunch
Register at Carina Jensen, Dep. of Mathematics and Physics, 35 28 23 66, email: cje@kvl.dk no later than 21. September. Price: kr. 100 to be paid in cash at the meeting.

13.00-13.50: Statistics in general - statistics in chemistry - chemometrics.
Rolf Sundberg
Mathem. statistics
Stockholm University

Some questions to be discussed: What is chemometrics? Is the notion of chemometrics needed/useful/harmful? Should statisticians leave chemometrics to chemometricians? Why should chemists not leave chemometrics to statisticians? What should statisticians learn from chemometrics?

I will provide a historical perspective of chemometrics (and other -metrics), starting already in the 19th century, talking in particular about the interplay between statistics and problems from chemistry, but also about some lack of interplay in more recent years. A person named Gosset will be given by me an important role in the talk.

I will also talk about the character of the statistical problems of modern chemometrics, why statisticians should study them and will benefit from it, and perhaps some statistical challenges. In particular I certainly cannot avoid discussing the (in-)famous PLS, as one important theme or example. Another question to be discussed is the role of chemometrics outside applications in chemistry.

13.50-14.30: Multi-way analysis - A new tool for solving fundamental model problems in spectroscopy, signal processing etc.
Rasmus Bro
Chemometrics Group
Dep. of Dairy and Food Science, KVL.

Multi-way analysis is a relatively new family of techniques with which so-called multi-way data can be analyzed. Two-way data are ordinary matrices that are often analyzed with bilinear techniques such as principal component analysis. Three-way data are data where instead of two, there are three indices characterizing each element. Hence, such data are arranged in "boxes" instead of tables/matrices. Higher-order data can be defined analogously.

A well-known problem in two-way analysis is the rotation freedom in bilinear models. This problem has been known for decades and is especially problematic in situations where identification of the underlying bilinear parameters is sought. It can be shown that the rotational freedom does not exist for some three-way models and the implication of this fundamental property is illustrated with some practical problems.

For example, it will be shown that it is possible to separate, mathematically, measured signals of mixtures (e.g. fluorescence measurements of food samples or cellular phone signals received at antenna) with the use of such multi-way models. Thus, both quantitative and qualitative analysis is possible directly from measured mixtures.

Finally, some attention will be paid to the evident lack of knowledge on the statistical properties of these models.

14.30-15.00: Coffee

15.00-15.40: The versatile PLS Regression
Harald Martens, dr.techn.
prof. II, Norwegian Univ. of Sci. & Technol.
guest prof., Technical University of Denmark
external prof., Royal Vet. & Agric. University

Data analytical methods differ in their ability to provide both cognitive and statistical tools for the user. When combined with proper interactive computer graphics, the soft multivariate bi-linear modelling method of PLS Regression (Wold et al. 1983) provides cognitive access to the relevant and reliable information in data. With cross-validation/jack-knifing it also provides statistical assessment of this reliability of the results.

There is no such thing as "THE BEST" regression method. PLS Regression is just one among many good methods. But to my knowledge, there is no other statistical method with comparable versatility (Martens H. & Martens M. 2001). When compared to competing regression methods, each optimised properly, the PLS Regression seems invariably to come out among the best ones, with respect to statistical predictive ability. Therefore, it is particularly well suited for non-statisticians - researchers who want to use their important contextual knowledge during the actual data analysis, and who cannot take the time to learn many different, abstract statistical methods.

The PLS Regression arose from unfulfilled data-analytical needs and in frustration over traditional statistics. It came about through the close collaboration between the statistician Herman Wold and his son, the chemist Svante Wold, combined with an intense co-operation between Svante and myself as chemometricians.

The development of the PLS Regression took place in the beginning of the 80-ies, between two widely different but sometimes equally suffocating scientific cultures: 1) The mathematical modelling in traditional chemistry and physics, focusing on hard causal models. 2) The parameter estimation of traditional statistical modelling, focusing too much on distribution theory and hypothesis testing and too little on the discovery process in multivariate data.

The PLS Regression (PLSR) is just one out of at least three disjoint methods called "PLS" - all based on Herman Wold's original work. The bi-linear PLSR has proven useful in a number of scientific fields, ranging from chemistry and biotechnology to psychology and marketing, for a wide range of applications (calibration of instruments, reduced-rank regression/prediction, symmetrical and assymmetrical classification/discrimination. When stabilized in certain details (internal ridging etc), and with bi-linear jack-knifing at various levels, it provides a more graphically accessible significance assessment with the potential of simplifying traditional ANOVA/MANOVA/Covariance analysis, in fixed as in mixed models.

References:

Martens H. and Næs T. (1989) Multivariate Calibration. J. Wiley & Sons, Ltd., 450 pages. (Citation Index: p.t. >1500)

Martens H. and Martens M. (2001) Multivariate Analysis of Quality. An Introduction. J. Wiley & Sons, Ltd., 440 pages

Wold, S., Martens, H. and Wold, H. 1983: The Multivariate Calibration Problem in Chemistry solved by the PLS Method. Proc. Conf. Matrix Pencils, (A. Ruhe and B. Kågström, eds.), March 1982, Lecture Notes in Mathematics, Springer Verlag, Heidelberg, 286-293.

15.40-17.00: Panel discussion about the role of chemometrics in relation to statistics and vice versa:

15.40-15.50: Per Brockhoff, Dep. of Mathematics and Physics, KVL.
15.50-16.00: Rolf Sundberg.
16.00-16.10: Harald Martens.
16.10-17.00: Open discussion.

Venue, lunch:

Kantinen - Gimle(Red no 11 on map)
Den Kgl. Veterinær- og Landbohøjskole
Grundtvigsvej 14,
1884 Frederiksberg C
(also entrance from Dyrlægevej)

Venue, talks:

Auditorium 3.01
Den Kgl. Veterinær- og Landbohøjskole
Thorvaldsensvej 40,
1871 Frederiksberg C

The meeting is open to everyone - members of DSTS as well as non-members. Apart from the lunch, no registration is needed.

Some links

Local organizers

Per Bruun Brockhoff and Ib Skovgaard
Dep. of Mathematics and Physics, KVL
Thorvaldsensvej 40
1871 Frederiskberg C.

kvl-logo Contact: per@dina.kvl.dk (Per Bruun Brockhoff)