Development and application of a 20K SNP array in potato
Vos, Peter - \ 2016
Wageningen University. Promotor(en): Richard Visser; Fred van Eeuwijk, co-promotor(en): Herman van Eck. - Wageningen : Wageningen University - ISBN 9789462579569 - 166
solanum tuberosum - potatoes - genotypes - single nucleotide polymorphism - data analysis - plant breeding - linkage disequilibrium - genome analysis - tetraploidy - aardappelen - genotypen - gegevensanalyse - plantenveredeling - verstoord koppelingsevenwicht - genoomanalyse - tetraploïdie
In this thesis the results are described of investigations of various application of genome wide SNP (single nucleotide polymorphism) markers. The set of SNP markers was identified by GBS (genotyping by sequencing) strategy. The resulting dataset of 129,156 SNPs across 83 tetraploid varieties was used directly to map traits, but also as a basis for the development of a 20K SNP array in Potato (Solanum tuberosum L.). Subsequently this array, named SolSTW, was used to collect genotypic data from 569 potato genotypes. This dataset offered insight in the breeding history of potato, population structure, linkage disequilibrium (LD) and the potential of GWAS (genome wide association studies) in potato.
In Chapter 2 we describe to development of the SolSTW 20K Infinium SNP array. One third of the SNPs on this array originate from the well-known SolCAP 8303 SNP array. The other SNPs are a subset from a targeted re-sequencing project of 83 tetraploid potato varieties. Because of the high SNP density in potato only a limited number of SNPs is suitable for assay development on a SNP array. An obvious outcome is that flanking SNPs contribute to assay failure, particularly for assays with SNPs located in introns. We used fitTetra software to cluster the distribution of captured signals of each marker into the expected five genotypic classes (nulliplex, simplex, duplex, triplex, quadruplex), resulting in a dataset with 14,530 SNP markers. Subsequently the genotypic data obtained with the SolSTW array was used to characterize a set of 569 potato varieties, advanced breeding clones and progenitors. This resulted in the identification of several footprints of potato breeding. Firstly SNPs were dated i.e. the year of market release of the first variety showing polymorphism for a SNP locus is an indication of the ancestry of a SNP. In such a way we identified SNPs with an ancestry tracing back to heirloom varieties, and SNPs (post-1945 SNPs) tracing back to wild species used in modern introgression breeding. Secondly, the changes in allele frequency were calculated over time. Most SNPs show a relative stable allele frequency over time, and very limited genetic variation is removed from the gene-pool of potato i.e genetic erosion is almost absent. Therefore we conclude that 100 years of breeding has not been able to get rid of non-beneficial genetic variation. Only a limited number of SNPs show a rapid increased in allele frequency, which can be explained by positive selection for disease resistance by breeders, or the more frequent use of several founders.
Better understanding of the genome wide decay of Linkage Disequilibrium (LD) and population structure offers relevant knowledge to perform and interpret the results of a genome wide association study (GWAS) (Chapter 3). Linkage disequilibrium (LD) is a complex phenomenon, and the influence of the factors shaping LD in tetraploids is hardly studied. Therefore we used simulated data to disentangle and therewith understand often-confounded factors underlying LD-decay. We simulated datasets differing in number of haplotypes in a population, and differing in percentage of haplotype specific SNPs. In these simulations we observed that the choice of an estimator of LD-decay has a major effect on the outcome of an LD-decay estimate, while the true LD-decay remains the same. Based on the simulation we conclude that a 90% percentile and a so-called D1/2 (the distance where 50% of the initial LD is decayed) performed best to estimate and compare LD-decay in potato. To understand the various aspects of LD-decay in the variety panel of 537 varieties, the panel was subdivided in several groups based on the age of a variety and the population structure groups. This resulted in the identification of LD-decay over time, i.e in relatively young varieties the average size of the LD-blocks is smaller. The differences between subpopulations were smaller and are most likely the effect of the population structure. We also observed that there are very long LD-blocks caused by introgression breeding and that different a priori MAF-thresholds also can influence the outcome of LD-decay estimation.
Having both LD-decay and population structure defined a genome wide association study (GWAS) was conducted (Chapter 4). For this purpose α-solanine and α-chaconine were measured in potato tubers. Subsequently the sum of both (total SGA) and the ratio between the two were used to discover QTLs for these traits in a GWAS. Additionally we used three bi-parental populations to validate the GWAS results. Total SGA content was confounded with population structure and therefore it was difficult to explain all phenotypic variation with SNP markers. Two QTLs (Sgt1.1 and Sgt11.1) were identified which could be validated in one of the segregating populations. The ratio between α-solanine and α-chaconine was not confounded with population structure, resulted in the identification of two major-effect QTLs (Sgr7.1 & Sgr8.1) located near the candidate genes SGT1 and SGT2, which are known for being responsible in the final steps towards either α-solanine or α-chaconine. The QTL Sgr8.1 could be validated, however similar phenotypes were explained by different haplotypes in two populations. We show that population structure, low frequent alleles and genetic heterogeneity may explain to some degree the missing heritability in GWAS in potato.
In Chapter 5 we describe how the method of graphical genotyping, which is widely used in diploid bi-parental populations, can be applied in a variety panel of tetraploid varieties. We show that a few discrete filtering steps in Excel can be used to display patterns that are visual representations of introgression segments and the locations of historical recombination events. Using this method we identified introgression segments from Solanum vernei including the Gpa5 locus on chromosome 5 and Solanum stoloniferum introgression segment including a gene involved in resistance to Potato Virus Y on chromosome 11. This method requires that the haplotypes that cause the phenotypic effect have to be identical by descent (IBD).
In the final chapter 6 the results of chapter 2 to 5 are discussed. We look forward on how our results can be used in future research and applied in marker-assisted breeding. Additionally some new GWAS results are presented for tuber flesh colour, foliage maturity and resistance to Globodera pallida pathotype 3.
Linkage disequilibrium and genomic selection in pigs
Veroneze, R. - \ 2015
Wageningen University. Promotor(en): Johan van Arendonk; S.E.F. Guimarães, co-promotor(en): John Bastiaansen. - Wageningen : Wageningen University - ISBN 9789462574151 - 142
varkens - verstoord koppelingsevenwicht - loci voor kwantitatief kenmerk - genomica - populaties - kruising - inteeltlijnen - fokwaarde - selectief fokken - genetica - pigs - linkage disequilibrium - quantitative trait loci - genomics - populations - crossbreds - inbred lines - breeding value - selective breeding - genetics
Securing a sufficiently large set of genotypes and phenotypes can be a limiting factor when implementing genomic selection. This limitation may be overcome by combining data from multiple populations or by using information of crossbred animals. The research described in this thesis characterized linkage disequilibrium (LD) patterns in different pig populations and evaluated whether the consistency of LD between populations allows us to make predictions about the performance of genomic selection when multiple populations are included in the prediction and/or validation datasets.
In chapter 2 I evaluated the persistence of LD and patterns of LD decay of pure and crossbred pig populations using real data that was representative of the crossbreeding structure of pig production. The persistence of phase between the crosses and their parental populations was high, indicating that similar marker effects might be expected across these populations. Across the purebred populations the persistence of phase was low therefore higher density panels should be used to have the same marker-QTL associations across these populations.
In chapter 3, the well-known nonlinear model developed by Sved (1971) was compared against a an alternative, loess regression, to describe LD decay. The loess regression model was found to be less influenced by the lack of residual normality, independence and homogeneity of variance than the nonlinear regression model. The loess regression model resulted in more reliable LD predictions and can be used to formally compare the LD decay curves between populations.
Chapter 4 showed the utility of different reference sets (across- and multi-population) for the prediction of genomic breeding values, as well as the potential of using crossbred performance in genomic prediction. None of the accuracies obtained using across-population, or multi-population genomic prediction, nor the accuracies obtained using crossbred data, followed the expectations based on LD that was described in chapter 2. I showed that across-population prediction accuracy was negligible even when the populations had common breeds in their genetic background. The variable accuracies of multi-population prediction and moderate accuracy of prediction of crossbred performance appeared to be a result of the differences in genetic architecture between pure populations and between purebred and crossbred animals.
In chapter 5, a methodology that uses information from genome wide association analyses in the genomic predictions was developed and evaluated. The aim in chapter 5 was to let the genomic prediction model use information from the genetic architecture in single- and multi-population genomic prediction. I showed that using weights based on GWAS results from a combined population did result in higher accuracies of GBLUP in single- as well as in multi-population predictions.
In chapter 6 I placed my results in a broader context. I discussed about the theoretical and practical aspects of linkage disequilibrium in breeding and in the estimation of effective population size. I also discussed the application of genomic selection in a small population and in practical pig breeding, including the prospects of using whole genome sequence for genomic prediction.
Association mapping in tetraploid potato
hoop, B.B. D' - \ 2009
Wageningen University. Promotor(en): Richard Visser; Fred van Eeuwijk, co-promotor(en): Herman van Eck. - [S.l.] : S.n. - ISBN 9789085853336 - 161
solanum tuberosum - genetische kartering - microsatellieten - loci voor kwantitatief kenmerk - verstoord koppelingsevenwicht - genetische analyse - genetische merkers - cultivars - marker assisted breeding - aflp - genetic mapping - microsatellites - quantitative trait loci - linkage disequilibrium - genetic analysis - genetic markers - amplified fragment length polymorphism
The results of a four year project within the Centre for BioSystems Genomics (www.cbsg.nl), entitled “Association mapping and family genotyping in potato” are described in this thesis. This project was intended to investigate whether a recently emerged methodology, association mapping, could provide the means to improve potato breeding efficiency.
In an attempt to answer this research question a set of potato cultivars representative for the commercial potato germplasm was selected. In total 240 cultivars and progenitor clones were chosen. In a later stage this set was expanded with 190 recent breeds contributed by five participating breeding companies which resulted in a total of 430 genotypes.
In a pilot experiment, the results of which are reported in Chapter 2, a subset of 220 of the abovementioned 240 cultivars and progenitor clones was used. Phenotypic data was retrieved through contributions of the participating breeding companies and represented summary statistics of recent observations for a number of traits across years and locations, calculated following company specific procedures. With AFLP marker data, in the form of normalised log-transformed band intensities, obtained from five well-known primer combinations, the extent of linkage disequilibrium (LD), using the r2 statistic, was estimated. Population structure within the set of 220 cultivars was analysed by deploying a clustering approach. No apparent, nor statistically supported population structure was revealed and the LD seemed to decay below the threshold of 0.1 at a genetic distance of about 3cM with this set of marker data. Furthermore, marker-trait associations were investigated by fitting single marker regression models for phenotypic traits on marker band intensities with and without correction for population structure. Population structure correction was performed in a straightforward way by incorporating a design matrix into the model assuming that each breeding company represented a different breeding germplasm pool. The potential of association mapping in tetraploid potato has been demonstrated in this pilot experiment, because existing phenotypic data, a modest number of AFLP markers, and a relatively straightforward statistical analysis allowed identification of interesting associations for a number of agro-morphological and quality traits.
These promising results encouraged us to engage into an encompassing genome-wide association mapping study in potato. Two association mapping panels were compiled. One panel comprising 205 genotypes, all of which were also present in the set used for the pilot experiment, and another panel containing in total 299 genotypes including the entire set of 190 recent breeds together with a series of standard cultivars, about 100 of which are in common with the first panel. Phenotypic data for the association panel with 205 genotypes were obtained in a field trial performed in 2006 in Wageningen at two locations with two replicates. We will refer to this set as the “2006 field trial”. Phenotypic data for the other panel with 299 genotypes was contributed by the five participating breeding companies and consisted of multi-year-multi-location data obtained during generations of clonal selection. The 2006 data were nicely balanced, because the trial was designed in that way. The historical breeding dataset was highly unbalanced. Analysis of these two differing phenotypic datasets was performed to deliver insight in variance components for the genotypic main effects and the genotype by environment interaction (GEI), besides estimated genotype main effects across environments. Both phenotypic datasets were analysed separately within a mixed model framework including terms for GEI. In Chapter 3 we describe both phenotypic datasets by comparing variance components, heritabilities (=repeatabilities), intra-dataset relationships and inter-dataset relationships. Broader aspects related to phenotypic datasets and their analysis are discussed as well.
To retrieve information about hidden population structure and genetic relatedness, and to estimate the extent of LD in potato germplasm, we used marker information generated with 41 AFLP primer combinations and 53 microsatellite loci on a collection of 430 genotypes. These 430 genotypes contain all genotypes present in the two association mapping panels introduced before plus a few extra genotypes to increase potato germplasm coverage. Two methods were used: a Bayesian approach and a distance-based clustering approach. Chapter 4 describes the results of this exercise. Both strategies revealed a weak level of structure in our material. Groups were detected which complied with criteria such as their intended market segment, as well as groups differing in their year of first registration on a national list. Linkage disequilibrium, using the r2 statistic, appeared to decay below the threshold of 0.1 across linkage groups at a genetic distance of about 5cM on average. The results described in Chapter 4 are promising for association mapping research in potato. The odds are reasonable that useful marker-trait associations can be detected and that the potential mapping resolution will suffice for detection of QTL in an association mapping context.
In Chapter 5 a comprehensive genome-wide association mapping study is presented. The adjusted genotypic means obtained from two association mapping panels as a result of phenotypic analysis performed in Chapter 3 were combined with marker information in two association mapping models. Marker information consisted of normalised log-transformed band intensities of 41 AFLP primer combinations and allele dosage information from 53 microsatellites. A baseline model without correction for population structure and a more advanced model with correction for population structure and genetic relatedness were applied. Population structure and genetic relatedness were estimated using available marker information. Interesting QTL could be identified for 19 agro-morphological and quality traits. The observed QTL partly confirm previous studies e.g. for tuber shape and frying colour, but also new QTL have been detected e.g. for after baking darkening and enzymatic browning.
In the final chapter, the general discussion, results of preceding chapters are evaluated and their implications for research as well as breeding are discussed.
Mapping of yield, yield stability, yield adaptability and other traits in barley using linkage disequilibrium mapping and linkage analysis
Kraakman, A.T.W. - \ 2005
Wageningen University. Promotor(en): Richard Visser, co-promotor(en): Fred van Eeuwijk. - Wageningen : s.n. - ISBN 9085042054 - 136
hordeum vulgare - gerst - genkartering - verstoord koppelingsevenwicht - genetische merkers - kwantitatieve kenmerken - genotype-milieu interactie - gerstevergelingsvirus - roestziekten - ziekteresistentie - barley - gene mapping - linkage disequilibrium - genetic markers - quantitative traits - genotype environment interaction - barley yellow dwarf virus - rust diseases - disease resistance