Staff Publications

Staff Publications

  • external user (warningwarning)
  • Log in as
  • language uk
  • About

    'Staff publications' is the digital repository of Wageningen University & Research

    'Staff publications' contains references to publications authored by Wageningen University staff from 1976 onward.

    Publications authored by the staff of the Research Institutes are available from 1995 onwards.

    Full text documents are added when available. The database is updated daily and currently holds about 240,000 items, of which 72,000 in open access.

    We have a manual that explains all the features 

Record number 558371
Title All sparse PCA models are wrong, but some are useful. Part I: Computation of scores, residuals and explained variance
Author(s) Camacho, J.; Smilde, A.K.; Saccenti, E.; Westerhuis, J.A.
Source Chemometrics and Intelligent Laboratory Systems 196 (2020). - ISSN 0169-7439
DOI https://doi.org/10.1016/j.chemolab.2019.103907
Department(s) Systems and Synthetic Biology
VLAG
Publication type Refereed Article in a scientific journal
Publication year 2020
Keyword(s) Explained variance - Exploratory data analysis - Residuals - Scores - Sparse principal component analysis
Abstract

Sparse Principal Component Analysis (sPCA) is a popular matrix factorization approach based on Principal Component Analysis (PCA) that combines variance maximization and sparsity with the ultimate goal of improving data interpretation. When moving from PCA to sPCA, there are a number of implications that the practitioner needs to be aware of. A relevant one is that scores and loadings in sPCA may not be orthogonal. For this reason, the traditional way of computing scores, residuals and variance explained that is used in the classical PCA can lead to unexpected properties and therefore incorrect interpretations in sPCA. This also affects how sPCA components should be visualized. In this paper we illustrate this problem both theoretically and numerically using simulations for several state-of-the-art sPCA algorithms, and provide proper computation of the different elements mentioned. We show that sPCA approaches present disparate and limited performance when modeling noise-free, sparse data. In a follow-up paper, we discuss the theoretical properties that lead to this undesired behavior. We title this series of papers after the famous phrase of George Box “All models are wrong, but some are useful” with the same original meaning: sPCA models are only approximations of reality and have structural limitations that should be taken into account by the practitioner, but properly applied they can be useful tools to understand data.

Comments
There are no comments yet. You can post the first one!
Post a comment
 
Please log in to use this service. Login as Wageningen University & Research user or guest user in upper right hand corner of this page.