The evolution of pyrrolizidine alkaloid diversity among and within Jacobaea species
Chen, Yangan ; Mulder, Patrick P.J. ; Schaap, Onno ; Memelink, Johan ; Klinkhamer, Peter G.L. ; Vrieling, Klaas - \ 2020
Journal of Systematics and Evolution (2020). - ISSN 1674-4918
ancestral state reconstruction - hierarchical cluster analysis - LC-MS/MS - phylogenetic signal - principal component analysis - secondary metabolite diversity
Plants produce many secondary metabolites showing considerable inter- and intraspecific diversity of concentration and composition as a strategy to cope with environmental stresses. The evolution of plant defenses against herbivores and pathogens can be unraveled by understanding the mechanisms underlying chemical diversity. Pyrrolizidine alkaloids are a class of secondary metabolites with high diversity. We performed a qualitative and quantitative analysis of 80 pyrrolizidine alkaloids with liquid chromatography-tandem mass spectrometry of leaves from 17 Jacobaea species including one to three populations per species with 4–10 individuals per population grown under controlled conditions in a climate chamber. We observed large inter- and intraspecific variation in pyrrolizidine alkaloid concentration and composition, which were both species-specific. Furthermore, we sequenced 11 plastid and three nuclear regions to reconstruct the phylogeny of the 17 Jacobaea species. Ancestral state reconstruction at the species level showed mainly random distributions of individual pyrrolizidine alkaloids. We found little evidence for phylogenetic signals, as nine out of 80 pyrrolizidine alkaloids showed a significant phylogenetic signal for Pagel's λ statistics only, whereas no significance was detected for Blomberg's K measure. We speculate that this high pyrrolizidine alkaloid diversity is the result of the upregulation and downregulation of specific pyrrolizidine alkaloids depending on ecological needs rather than gains and losses of particular pyrrolizidine alkaloid biosynthesis genes during evolution.
Considering Horn’s Parallel Analysis from a Random Matrix Theory Point of View
Saccenti, Edoardo ; Timmerman, Marieke E. - \ 2017
Psychometrika 82 (2017)1. - ISSN 0033-3123 - p. 186 - 209.
common factor analysis - covariance matrix - number of common factors - number of principal components - principal component analysis
Horn’s parallel analysis is a widely used method for assessing the number of principal components and common factors. We discuss the theoretical foundations of parallel analysis for principal components based on a covariance matrix by making use of arguments from random matrix theory. In particular, we show that (i) for the first component, parallel analysis is an inferential method equivalent to the Tracy–Widom test, (ii) its use to test high-order eigenvalues is equivalent to the use of the joint distribution of the eigenvalues, and thus should be discouraged, and (iii) a formal test for higher-order components can be obtained based on a Tracy–Widom approximation. We illustrate the performance of the two testing procedures using simulated data generated under both a principal component model and a common factors model. For the principal component model, the Tracy–Widom test performs consistently in all conditions, while parallel analysis shows unpredictable behavior for higher-order components. For the common factor model, including major and minor factors, both procedures are heuristic approaches, with variable performance. We conclude that the Tracy–Widom procedure is preferred over parallel analysis for statistically testing the number of principal components based on a covariance matrix.
On the use of the observation-wise k-fold operation in PCA cross-validation
Saccenti, E. ; Camacho, J. - \ 2015
Journal of Chemometrics 29 (2015)8. - ISSN 0886-9383 - p. 467 - 478.
principal component analysis - missing data - models - number - spectroscopy - mspc - pls
Cross-validation (CV) is a common approach for determining the optimal number of components in a principal component analysis model. To guarantee the independence between model testing and calibration, the observationwise k-fold operation is commonly implemented in each cross-validation step. This operation renders the CV algorithm computationally intensive, and it is the main limitation to apply CV on very large data sets. In this paper, we carry out an empirical and theoretical investigation of the use of this operation in the element-wise k-fold (ekf) algorithm, the state-of-the-art CV algorithm. We show that when very large data sets need to be cross-validated and the computational time is a matter of concern, the observation-wise k-fold operation can be skipped. The theoretical properties of the resulting modified algorithm, referred to as column-wise k-fold (ckf) algorithm, are derived. Also, its performance is evaluated with several artificial and real data sets. We suggest the ckf algorithm to be a valid alternative to the standard ekf to reduce the computational time needed to cross-validate a data set
Another look at Bayesian analysis of AMMI models for genotype-environment data
Josse, J. ; Eeuwijk, F.A. van; Piepho, H.P. ; Denis, J.B. - \ 2014
Journal of Agricultural, Biological, and Environmental Statistics 19 (2014)2. - ISSN 1085-7117 - p. 240 - 257.
principal component analysis - multiplicative interaction - cultivar trials - mixed models - variance - prediction - statistics - parameters - variety
Linear–bilinear models are frequently used to analyze two-way data such as genotype-by-environment data. A well-known example of this class of models is the additive main effects and multiplicative interaction effects model (AMMI). We propose a new Bayesian treatment of such models offering a proper way to deal with the major problem of overparameterization. The rationale is to ignore the issue at the prior level and apply an appropriate processing at the posterior level to be able to arrive at easily interpretable inferences. Compared to previous attempts, this new strategy has the great advantage of being directly implementable in standard software packages devoted to Bayesian statistics such as WinBUGS/OpenBUGS/JAGS. The method is assessed using simulated datasets and a real dataset from plant breeding. We discuss the benefits of a Bayesian perspective to the analysis of genotype-by-environment interactions, focusing on practical questions related to general and local adaptation and stability of genotypes. We also suggest a new solution to the estimation of the risk of a genotype not exceeding a given threshold.
A weighted AMMI Algorithm to Study Genotype-by-Environment Interaction and QTL-by-Environment Interaction
Rodrigues, P.C. ; Malosetti, M. ; Gauch, H.G. - \ 2014
Crop Science 54 (2014)4. - ISSN 0011-183X - p. 1555 - 1570.
principal component analysis - multiplicative interaction-model - joint regression-analysis - additive main - cross-validation - yield trials - barley cross - mixed-model - selection - gene
Genotype-by-environment (G × E) interaction (GEI) and quantitative trait locus (QTL)-by-environment interaction (QEI) are common phenomena in multiple-environment trials and represent a major challenge to breeders. The additive main effects and multiplicative interaction (AMMI) model is a widely used tool for the analysis of multiple-environment trials, where the data are represented by a two-way table of G × E means. For complete tables, least squares estimation for the AMMI model is equivalent to fitting an additive two-way ANOVA model for the main effects and applying a singular value decomposition to the interaction residuals, thereby implicitly assuming equal weights for all G × E means. However, multiple-environment data with strong GEI are often also characterized by strong heterogeneous error variation. To improve the performance of the AMMI model in the latter situation, we introduce a generalized estimation scheme, the weighted AMMI or W-AMMI algorithm. This algorithm is useful for studying GEI and QEI. For QEI, the W-AMMI algorithm can be used to create predicted values per environment that are subjected to QTL analysis. We compare the performance of this combined W-AMMI and QTL mapping strategy to direct QTL mapping on G × E means and to QTL mapping on AMMI-predicted values, again with QTL analyses for individual environments. Finally, we compare the W-AMMI QTL mapping strategy, with a multi-environment mixed model QTL mapping approach. Two data sets are used: (i) data from a simulated pepper (Capsicum annuum L.) back cross population using a crop growth model to relate genotypes to phenotypes in a nonlinear way, and (ii) the doubled-haploid Steptoe × Morex barley (Hordeum vulgare L.) population. The QTL analyses on the W-AMMI-predicted values outperformed the QTL analyses on the G × E means and on the AMMI-predicted values, and were very similar to the mixed model QTL mapping approach with regard to the number and location of the true positive QTLs detected, especially for QTLs associated with the interaction and for environments with higher error variance. W-AMMI analysis for GEI and QEI provides an easy-to-use and robust tool with wide applicability.
Chemical Composition, Sensory Properties, Provenance, and Bioactivity of Fruit Juices as Assessed bij Chemometrics: A Critical Review and Guideline
Ziekinski, A.F. ; Haminiuk, C.W.I. ; Nunes, C.A. ; Schnitzler, E. ; Ruth, S.M. van; Granato, D. - \ 2014
Comprehensive Reviews in Food Science and Food Safety 13 (2014)3. - ISSN 1541-4337 - p. 300 - 316.
principal component analysis - near-infrared spectroscopy - performance liquid-chromatography - commercial grape juices - antioxidant activity - consumer segmentation - geographical origin - electronic tongue - phenolic content - orange juice
The use of univariate, bivariate, and multivariate statistical techniques, such as analysis of variance, multiple comparisons of means, and linear correlations, has spread widely in the area of Food Science and Technology. However, the use of supervised and unsupervised statistical techniques (chemometrics) in order to analyze and model experimental data from physicochemical, sensory, metabolomics, quality control, nutritional, microbiological, and chemical assays in food research has gained more space. Therefore, we present here a manuscript with theoretical details, a critical analysis of published work, and a guideline for the reader to check and propose mathematical models of experimental results using the most promising supervised and unsupervised multivariate statistical techniques, namely: principal component analysis, hierarchical cluster analysis, linear discriminant analysis, partial least square regression, k-nearest neighbors, and soft independent modeling of class analogy. In addition, the overall features, advantages, and limitations of such statistical methods are presented and discussed. Published examples are focused on sensory, chemical, and antioxidant activity of a wide range of fruit juices consumed worldwide.
Multivariate PAT solutions for biopharmaceutical cultivation: current progress and limitations
Mercier, S.M. ; Diepenbroek, B. ; Wijffels, R.H. ; Streefland, M. - \ 2014
Trends in Biotechnology 32 (2014)6. - ISSN 0167-7799 - p. 329 - 336.
process analytical technology - principal component analysis - monitoring batch processes - cell-culture - biotechnology - spectroscopy - quality - chromatography - fermentation - chemometrics
Increasingly elaborate and voluminous datasets are generated by the (bio)pharmaceutical industry and are a major challenge for application of PAT and QbD principles. Multivariate data analysis (MVDA) is required to delineate relevant process information from large multi-factorial and multi-collinear datasets. Here the key role of MVDA for industrial (bio)process data is discussed, with a focus on progress and limitations of MVDA as a PAT solution for biopharmaceutical cultivation processes. MVDA based models were proven useful and should be routinely implemented for bioprocesses. It is concluded that although the highest level of PAT with process control within its design space in real-time during manufacturing is not reached yet, MVDA will be central to reach this ultimate objective for cell cultivations.
Authentication of organic eggs by LC fingerprinting and isotope ratio analysis
Ruth, S.M. van; Rogers, K. ; Newton-Smith, E. ; Koot, A.H. ; Alewijn, M. - \ 2012
analytische methoden - massaspectrometrie - vloeistofchromatografie - eieren - biologische voedingsmiddelen - principale componentenanalyse - analytical methods - mass spectrometry - liquid chromatography - eggs - organic foods - principal component analysis
The aim of the present study was to develop and modify fingerprint methodology for the verification of Dutch organic eggs versus conventional (barn/free range) eggs.
Characterization of Fen-Daqu Through Multivariate Statistical Analysis of H-1 NMR Spectroscopic Data
Van-Diep, L. ; Zheng, X. ; Ma, K. ; Chen, J.Y. ; Han, B.Z. ; Nout, M.J.R. - \ 2011
Journal of the Institute of Brewing 117 (2011)4. - ISSN 0046-9750 - p. 516 - 522.
magnetic-resonance-spectroscopy - principal component analysis - acid bacteria - fermentation - behaviors - starter - liquor - beer
J. Inst. Brew. 117(4), 516-522, 2011 Fen liquor is typical of Chinese light-flavour liquor (alcoholic spirit), which is fermented from sorghum with Fen-Daqu powder. Fen-Daqu is a saccharifying agent and fermentation starter in this fermentation process and in Fen traditional vinegar. To investigate the changes of biochemical components in Fen-Daqu during the incubation, samples at seven incubation stages were analyzed by H-1 nuclear magnetic resonance (NMR) spectrometry and principal component analysis (PCA). This revealed clear separation of the samples obtained from different incubation stages in the principal component plots by combining PC1 and PC2, which cumulatively accounted for 93.27% of the variance. The major compounds that contributed to discrimination were acetate/alanine, arginine, ascorbate, betaine, choline, ethanol, fructose, galactose, glucose, glucitol, glycerate, lactate, maltose, mannitol, phenylalanine, proline, propylene glycol, threonine and tryptophan. These compounds were regarded as the representative metabolites or biomarkers characteristic for each incubation stage and were related with microbiological changes of importance for quality control in Fen-Daqu production.
Data-processing strategies for metabolomics studies
Hendriks, M.M.W.B. ; Eeuwijk, F.A. van; Jellema, R.H. ; Westerhuis, J.A. ; Reijmers, T.H. ; Hoefsloot, H.C.J. ; Smilde, A.K. - \ 2011
TrAC : Trends in Analytical Chemistry 30 (2011)10. - ISSN 0165-9936 - p. 1685 - 1698.
principal component analysis - mass-spectrometry - variable selection - optimal-design - models - identification - metabolites - networks - tool - nmr
Metabolomics studies aim at a better understanding of biochemical processes by studying relations between metabolites and between metabolites and other types of information (e.g., sensory and phenotypic features). The objectives of these studies are diverse, but the types of data generated and the methods for extracting information from the data and analysing the data are similar. Besides instrumental analysis tools, various data-analysis tools are needed to extract this relevant information. The entire data-processing workflow is complex and has many steps. For a comprehensive overview, we cover the entire workflow of metabolomics studies, starting from experimental design and sample-size determination to tools that can aid in biological interpretation. We include illustrative examples and discuss the problems that have to be dealt with in data analysis in metabolomics. We also discuss where the challenges are for developing new methods and tailor-made quantitative strategies
Probiotic modulation of symbiotic gut microbial-host metabolic interactions in a humanized microbiome mouse model
Martin, F.P.J. ; Wang, Y. ; Sprenger, N. ; Yap, K.S. ; Rezzi, S. ; Ramadan, Z. ; Peré-Trepat, E. ; Rochat, F. ; Cherbut, C. ; Bladeren, P.J. van; Fay, L.B. ; Kochhar, S. ; LindOn, J.C. ; Holmes, E. ; Nicholson, J.K. - \ 2008
Molecular Systems Biology 4 (2008)1. - ISSN 1744-4292 - 15 p.
spinning h-1-nmr spectroscopy - principal component analysis - conjugated linoleic-acid - global systems biology - fatty-acids - bile-acids - clostridium-perfringens - intestinal microflora - multivariate-analysis - in-vitro
The transgenomic metabolic effects of exposure to either Lactobacillus paracasei or Lactobacillus rhamnosus probiotics have been measured and mapped in humanized extended genome mice (germ-free mice colonized with human baby flora). Statistical analysis of the compartmental fluctuations in diverse metabolic compartments, including biofluids, tissue and cecal short-chain fatty acids (SCFAs) in relation to microbial population modulation generated a novel top-down systems biology view of the host response to probiotic intervention. Probiotic exposure exerted microbiome modification and resulted in altered hepatic lipid metabolism coupled with lowered plasma lipoprotein levels and apparent stimulated glycolysis. Probiotic treatments also altered a diverse range of pathways outcomes, including amino-acid metabolism, methylamines and SCFAs. The novel application of hierarchical-principal component analysis allowed visualization of multicompartmental transgenomic metabolic interactions that could also be resolved at the compartment and pathway level. These integrated system investigations demonstrate the potential of metabolic profiling as a top-down systems biology driver for investigating the mechanistic basis of probiotic action and the therapeutic surveillance of the gut microbial activity related to dietary supplementation of probiotics.
An overview of analytical methods for determining the geographical origin of food products
Luykx, D.M.A.M. ; Ruth, S.M. van - \ 2008
Food Chemistry 107 (2008)2. - ISSN 0308-8146 - p. 897 - 911.
virgin olive oils - ratio mass-spectrometry - principal component analysis - stable-isotope analysis - face fluorescence spectroscopy - nuclear-magnetic-resonance - french red wines - capillary-electrophoresis - gas-chromatography - multivariate-analysis
There is an increasing interest by consumers for high quality food products with a clear geographical origin. These products are encouraged and suitable analytical techniques are needed for the quality control. This overview concerns an investigation of the current analytical techniques that are being used for the determination of the geographical origin of food products. The analytical approaches have been subdivided into four groups; mass spectrometry techniques, spectroscopic techniques, separation techniques, and other techniques. The principles of the techniques together with their advantages and drawbacks, and reported applications concerning geographical authenticity are discussed. A combination of methods analysing different types of food compounds seems to be the most promising approach to establish the geographical origin. Chemometric analysis of the data provided by the analytical instruments is needed for such a multifactorial approach.
FLORES : identifying flowers by image content
Zedde, H.J. van de; Heijden, G.W.A.M. van der; Keizer, L.C.P. - \ 2007
multispectrale beelden - spectraalanalyse - principale componentenanalyse - kwaliteit - kenmerken - rozen - beeldvormende spectroscopie - patroonherkenning - uitwendige kenmerken - multispectral imagery - spectral analysis - principal component analysis - quality - traits - roses - imaging spectroscopy - pattern recognition - external traits
For auctions and plant variety testing, flowers need to be identified and compared. This is typically done by an expert. We try to develop a system to automatically compare an image of a flower with stored images of known varieties and retrieve the most similar ones. Spectral imaging allows calibration, making flower color invariant of recording equipment. Using simple similarity measures, spectral images of flowers proved to be superior to RGB images. The aim is to develop feature descriptors and similarity measures that will further increase the precision and recall of FLORES
Calibration of multivariate scatter plots for exploratory analysis of relations within and between sets of variables in genomic research
Graffelman, J. ; Eeuwijk, F.A. van - \ 2005
Biometrical Journal 47 (2005)6. - ISSN 0323-3847 - p. 863 - 879.
canonical correlation-analysis - principal component analysis - biplots
The scatter plot is a well known and easily applicable graphical tool to explore relationships between two quantitative variables. For the exploration of relations between multiple variables, generalisations of the scatter plot are useful. We present an overview of multivariate scatter plots focussing on the following situations. Firstly, we look at a scatter plot for portraying relations between quantitative variables within one data matrix. Secondly, we discuss a similar plot for the case of qualitative variables. Thirdly, we describe scatter plots for the relationships between two sets of variables where we focus on correlations. Finally, we treat plots of the relationships between multiple response and predictor variables, focussing on the matrix of regression coefficients. We will present both known and new results, where an important original contribution concerns a procedure for the inclusion of scales for the variables in multivariate scatter plots. We provide software for drawing such scales. We illustrate the construction and interpretation of the plots by means of examples on data collected in a genomic research program on taste in tomato
Preprocessing and exploratory analysis of chromatographic profiles of plant extracts
Hendriks, M.M.W.B. ; Cruz-Juarez, L. ; Bont, D. de; Hall, R.D. - \ 2005
Analytica Chimica Acta 545 (2005)1. - ISSN 0003-2670 - p. 53 - 64.
principal component analysis - least-squares algorithms - mass-spectrometry - herbal products - identification - metabolomics - matrices - choice
The characterization of herbal extracts to compare samples from different origin is important for robust production and quality control strategies. This characterization is now mainly performed by analysis of selected marker compounds. Metabolic fingerprinting of full metabolite profiles of plant extracts aims at a more rapid and thorough screening or classification of plant material. We will show that HPLC is an appropriate technique for metabolic fingerprinting of secondary metabolites, given that adequate preprocessing of raw profiles is performed. Additional variation, which results from sample preparation and changing measurement conditions, usually obscures the information of interest in these raw profiles. This paper illustrates the importance of preprocessing of chromatographic fingerprinting data. Different alignment methods are discussed as well as the influence of normalization. Weighted principal component analysis is introduced as a valuable alternative to autoscaling of data. LC-UV data on Willow (Salix sp.) extracts is used to evaluate these preprocessing methods and their influence on exploratory data analysis