Development and application of a 20K SNP array in potato
Vos, Peter - \ 2016
Wageningen University. Promotor(en): Richard Visser; Fred van Eeuwijk, co-promotor(en): Herman van Eck. - Wageningen : Wageningen University - ISBN 9789462579569 - 166
solanum tuberosum - potatoes - genotypes - single nucleotide polymorphism - data analysis - plant breeding - linkage disequilibrium - genome analysis - tetraploidy - solanum tuberosum - aardappelen - genotypen - single nucleotide polymorphism - gegevensanalyse - plantenveredeling - verstoord koppelingsevenwicht - genoomanalyse - tetraploïdie
In this thesis the results are described of investigations of various application of genome wide SNP (single nucleotide polymorphism) markers. The set of SNP markers was identified by GBS (genotyping by sequencing) strategy. The resulting dataset of 129,156 SNPs across 83 tetraploid varieties was used directly to map traits, but also as a basis for the development of a 20K SNP array in Potato (Solanum tuberosum L.). Subsequently this array, named SolSTW, was used to collect genotypic data from 569 potato genotypes. This dataset offered insight in the breeding history of potato, population structure, linkage disequilibrium (LD) and the potential of GWAS (genome wide association studies) in potato.
In Chapter 2 we describe to development of the SolSTW 20K Infinium SNP array. One third of the SNPs on this array originate from the well-known SolCAP 8303 SNP array. The other SNPs are a subset from a targeted re-sequencing project of 83 tetraploid potato varieties. Because of the high SNP density in potato only a limited number of SNPs is suitable for assay development on a SNP array. An obvious outcome is that flanking SNPs contribute to assay failure, particularly for assays with SNPs located in introns. We used fitTetra software to cluster the distribution of captured signals of each marker into the expected five genotypic classes (nulliplex, simplex, duplex, triplex, quadruplex), resulting in a dataset with 14,530 SNP markers. Subsequently the genotypic data obtained with the SolSTW array was used to characterize a set of 569 potato varieties, advanced breeding clones and progenitors. This resulted in the identification of several footprints of potato breeding. Firstly SNPs were dated i.e. the year of market release of the first variety showing polymorphism for a SNP locus is an indication of the ancestry of a SNP. In such a way we identified SNPs with an ancestry tracing back to heirloom varieties, and SNPs (post-1945 SNPs) tracing back to wild species used in modern introgression breeding. Secondly, the changes in allele frequency were calculated over time. Most SNPs show a relative stable allele frequency over time, and very limited genetic variation is removed from the gene-pool of potato i.e genetic erosion is almost absent. Therefore we conclude that 100 years of breeding has not been able to get rid of non-beneficial genetic variation. Only a limited number of SNPs show a rapid increased in allele frequency, which can be explained by positive selection for disease resistance by breeders, or the more frequent use of several founders.
Better understanding of the genome wide decay of Linkage Disequilibrium (LD) and population structure offers relevant knowledge to perform and interpret the results of a genome wide association study (GWAS) (Chapter 3). Linkage disequilibrium (LD) is a complex phenomenon, and the influence of the factors shaping LD in tetraploids is hardly studied. Therefore we used simulated data to disentangle and therewith understand often-confounded factors underlying LD-decay. We simulated datasets differing in number of haplotypes in a population, and differing in percentage of haplotype specific SNPs. In these simulations we observed that the choice of an estimator of LD-decay has a major effect on the outcome of an LD-decay estimate, while the true LD-decay remains the same. Based on the simulation we conclude that a 90% percentile and a so-called D1/2 (the distance where 50% of the initial LD is decayed) performed best to estimate and compare LD-decay in potato. To understand the various aspects of LD-decay in the variety panel of 537 varieties, the panel was subdivided in several groups based on the age of a variety and the population structure groups. This resulted in the identification of LD-decay over time, i.e in relatively young varieties the average size of the LD-blocks is smaller. The differences between subpopulations were smaller and are most likely the effect of the population structure. We also observed that there are very long LD-blocks caused by introgression breeding and that different a priori MAF-thresholds also can influence the outcome of LD-decay estimation.
Having both LD-decay and population structure defined a genome wide association study (GWAS) was conducted (Chapter 4). For this purpose α-solanine and α-chaconine were measured in potato tubers. Subsequently the sum of both (total SGA) and the ratio between the two were used to discover QTLs for these traits in a GWAS. Additionally we used three bi-parental populations to validate the GWAS results. Total SGA content was confounded with population structure and therefore it was difficult to explain all phenotypic variation with SNP markers. Two QTLs (Sgt1.1 and Sgt11.1) were identified which could be validated in one of the segregating populations. The ratio between α-solanine and α-chaconine was not confounded with population structure, resulted in the identification of two major-effect QTLs (Sgr7.1 & Sgr8.1) located near the candidate genes SGT1 and SGT2, which are known for being responsible in the final steps towards either α-solanine or α-chaconine. The QTL Sgr8.1 could be validated, however similar phenotypes were explained by different haplotypes in two populations. We show that population structure, low frequent alleles and genetic heterogeneity may explain to some degree the missing heritability in GWAS in potato.
In Chapter 5 we describe how the method of graphical genotyping, which is widely used in diploid bi-parental populations, can be applied in a variety panel of tetraploid varieties. We show that a few discrete filtering steps in Excel can be used to display patterns that are visual representations of introgression segments and the locations of historical recombination events. Using this method we identified introgression segments from Solanum vernei including the Gpa5 locus on chromosome 5 and Solanum stoloniferum introgression segment including a gene involved in resistance to Potato Virus Y on chromosome 11. This method requires that the haplotypes that cause the phenotypic effect have to be identical by descent (IBD).
In the final chapter 6 the results of chapter 2 to 5 are discussed. We look forward on how our results can be used in future research and applied in marker-assisted breeding. Additionally some new GWAS results are presented for tuber flesh colour, foliage maturity and resistance to Globodera pallida pathotype 3.
Linkage disequilibrium and genomic selection in pigs
Veroneze, R. - \ 2015
Wageningen University. Promotor(en): Johan van Arendonk; S.E.F. Guimarães, co-promotor(en): John Bastiaansen. - Wageningen : Wageningen University - ISBN 9789462574151 - 142
varkens - verstoord koppelingsevenwicht - loci voor kwantitatief kenmerk - genomica - populaties - kruising - inteeltlijnen - fokwaarde - selectief fokken - genetica - pigs - linkage disequilibrium - quantitative trait loci - genomics - populations - crossbreds - inbred lines - breeding value - selective breeding - genetics
Securing a sufficiently large set of genotypes and phenotypes can be a limiting factor when implementing genomic selection. This limitation may be overcome by combining data from multiple populations or by using information of crossbred animals. The research described in this thesis characterized linkage disequilibrium (LD) patterns in different pig populations and evaluated whether the consistency of LD between populations allows us to make predictions about the performance of genomic selection when multiple populations are included in the prediction and/or validation datasets.
In chapter 2 I evaluated the persistence of LD and patterns of LD decay of pure and crossbred pig populations using real data that was representative of the crossbreeding structure of pig production. The persistence of phase between the crosses and their parental populations was high, indicating that similar marker effects might be expected across these populations. Across the purebred populations the persistence of phase was low therefore higher density panels should be used to have the same marker-QTL associations across these populations.
In chapter 3, the well-known nonlinear model developed by Sved (1971) was compared against a an alternative, loess regression, to describe LD decay. The loess regression model was found to be less influenced by the lack of residual normality, independence and homogeneity of variance than the nonlinear regression model. The loess regression model resulted in more reliable LD predictions and can be used to formally compare the LD decay curves between populations.
Chapter 4 showed the utility of different reference sets (across- and multi-population) for the prediction of genomic breeding values, as well as the potential of using crossbred performance in genomic prediction. None of the accuracies obtained using across-population, or multi-population genomic prediction, nor the accuracies obtained using crossbred data, followed the expectations based on LD that was described in chapter 2. I showed that across-population prediction accuracy was negligible even when the populations had common breeds in their genetic background. The variable accuracies of multi-population prediction and moderate accuracy of prediction of crossbred performance appeared to be a result of the differences in genetic architecture between pure populations and between purebred and crossbred animals.
In chapter 5, a methodology that uses information from genome wide association analyses in the genomic predictions was developed and evaluated. The aim in chapter 5 was to let the genomic prediction model use information from the genetic architecture in single- and multi-population genomic prediction. I showed that using weights based on GWAS results from a combined population did result in higher accuracies of GBLUP in single- as well as in multi-population predictions.
In chapter 6 I placed my results in a broader context. I discussed about the theoretical and practical aspects of linkage disequilibrium in breeding and in the estimation of effective population size. I also discussed the application of genomic selection in a small population and in practical pig breeding, including the prospects of using whole genome sequence for genomic prediction.
QTL mapping of pomological traits in peach and related species breeding germplasm
Fresnedo-Ramírez, J. ; Bink, M.C.A.M. ; Weg, W.E. van de; Famula, T.R. ; Crisosto, C.H. ; Frett, T. ; Gasic, K. ; Peace, C.P. ; Gradziel, T.M. - \ 2015
Molecular Breeding 35 (2015). - ISSN 1380-3743 - 19 p.
persica l. batsch - prunus-persica - linkage disequilibrium - fruit size - population-structure - candidate genes - genome database - sweet cherry - almond - cultivars
Peach is an economically important fruit tree crop that exhibits high phenotypic variability yet suffers from diversity-limited gene pool. Genetic introgression of novel alleles from related species is being pursued to expand genetic diversity. This process is, however, challenging and requires the incorporation of innovative genomic and statistical tools to facilitate efficient transfer of these exotic alleles across the multiple generations required for introgression. In this study, pedigree-based analysis (PBA) in a Bayesian QTL mapping framework was applied to a diverse peach pedigree introgressed with almond and other related Prunus species. The aim was to investigate the genetic control of eight commercially important fruit productivity and fruit quality traits over two subsequent years. Fifty-two QTLs with at least positive evidence explaining up to 98 % of the phenotypic variance across all trait/year combinations were mapped separately per trait and year. Several QTLs exhibited variable association with traits between years. By using the peach genome sequence as a reference, the intrachromosomal positions for several QTLs were shown to differ from those previously reported in peach. The inclusion of introgressed germplasm and the explicit declaration of the genetic structure of the pedigree as covariate in PBA enhanced the mapping and interpretation of QTLs. This study serves as a model study for PBA in a diverse peach breeding program, and the results highlight the ability of this strategy to identify genomic resources for direct utilization in marker-assisted breeding.
Impact of QTL properties on the accuracy of multi-breed genomic prediction
Wientjes, Y.C.J. ; Calus, M.P.L. ; Goddard, M.E. ; Hayes, B.J. - \ 2015
Genetics, Selection, Evolution 47 (2015). - ISSN 0999-193X
dairy-cattle populations - residual feed-intake - complex traits - linkage disequilibrium - genotype imputation - data sets - selection - values - animals - reliability
Background - Although simulation studies show that combining multiple breeds in one reference population increases accuracy of genomic prediction, this is not always confirmed in empirical studies. This discrepancy might be due to the assumptions on quantitative trait loci (QTL) properties applied in simulation studies, including number of QTL, spectrum of QTL allele frequencies across breeds, and distribution of allele substitution effects. We investigated the effects of QTL properties and of including a random across- and within-breed animal effect in a genomic best linear unbiased prediction (GBLUP) model on accuracy of multi-breed genomic prediction using genotypes of Holstein-Friesian and Jersey cows. Methods - Genotypes of three classes of variants obtained from whole-genome sequence data, with moderately low, very low or extremely low average minor allele frequencies (MAF), were imputed in 3000 Holstein-Friesian and 3000 Jersey cows that had real high-density genotypes. Phenotypes of traits controlled by QTL with different properties were simulated by sampling 100 or 1000 QTL from one class of variants and their allele substitution effects either randomly from a gamma distribution, or computed such that each QTL explained the same variance, i.e. rare alleles had a large effect. Genomic breeding values for 1000 selection candidates per breed were estimated using GBLUP modelsincluding a random across- and a within-breed animal effect. Results - For all three classes of QTL allele frequency spectra, accuracies of genomic prediction were not affected by the addition of 2000 individuals of the other breed to a reference population of the same breed as the selection candidates. Accuracies of both single- and multi-breed genomic prediction decreased as MAF of QTL decreased, especially when rare alleles had a large effect. Accuracies of genomic prediction were similar for the models with and without a random within-breed animal effect, probably because of insufficient power to separate across- and within-breed animal effects. Conclusions - Accuracy of both single- and multi-breed genomic prediction depends on the properties of the QTL that underlie the trait. As QTL MAF decreased, accuracy decreased, especially when rare alleles had a large effect. This demonstrates that QTL properties are key parameters that determine the accuracy of genomic prediction.
Tumour necrosis factor allele variants and their association with the occurrence and severity of malaria in African children: a longitudinal study
Gichohi-Wainaina, W.N. ; Boonstra, A. ; Feskens, E.J.M. ; Demir, A.Y. ; Veenemans, J. ; Verhoef, H. - \ 2015
Malaria Journal 14 (2015). - ISSN 1475-2875 - 11 p.
plasmodium-falciparum malaria - tnf-alpha promoter - cerebral malaria - linkage disequilibrium - rheumatoid-arthritis - diabetes-mellitus - polymorphisms - gene - disease - hla
Background Tumour necrosis factor (TNF) is central to the immune response to Plasmodium infection. Its plasma concentration is influenced by allele variants in the promoter region of TNF. The study’s objectives were to assess TNF allele variants (TNF-1031 , TNF-308 ): (1) modulation of malaria rates in young Tanzanian children; (2) modulation of the severity of malaria as indicated by haemoglobin concentrations at the time of presentation with febrile episodes; and (3) the association between Plasmodium infection and haemoglobin concentration in symptomless parasite carriers. Methods Data from a placebo-controlled trial in which 612 Tanzanian children aged 6–60 months with height-for-age z-score in the range -3 SD to 1.5 SD was utilised. Those with Plasmodium infection at baseline were treated with artemether-lumefantrine. An episode of malaria was predefined as current Plasmodium infection with an inflammatory response (axillary temperature =37.5°C or whole blood C-reactive protein concentration =8 mg/L) in children reported sick. Linkage disequilibrium (LD) pattern assessment as well as haplotype analysis was conducted using HAPLOVIEW. Cox regression models used in the primary analysis accounted for multiple episodes per child. Results Genotyping of 94.9% (581/612) children for TNF-1031 (TNF-1031 T>C); allele frequency was 0.39. Corresponding values for rs1800629 (TNF-308 G>A) were 95.4% (584/612) and 0.17. Compared to the wild type genotype (TT), malaria rates were increased in the TNF-1031 CC genotype (hazard ratio, HR [95% CI]: 1.41 [1.01¿1.97] and 1.31 [0.97¿1.76] for crude analysis and adjusting for pre-specified baseline factors, respectively) but decreased in those with the TNF-308 AA genotype (corresponding HR: 0.13 [0.02¿0.63] and 0.16 [0.04¿0.67]). These associations were weaker when analysing first episodes of malaria (P value -0.59 and 0.38, respectively). No evidence that allele variants of TNF-1031 and TNF-308 affected haemoglobin concentration at first episode of malaria, or that they modified the association between Plasmodium infection and haemoglobin concentrations at baseline was observed.
Empirical and deterministic accuracies of across-population genomic prediction
Wientjes, Y.C.J. ; Veerkamp, R.F. ; Bijma, P. ; Bovenhuis, H. ; Schrooten, C. ; Calus, M.P.L. - \ 2015
Genetics, Selection, Evolution 47 (2015). - ISSN 0999-193X
dairy-cattle breeds - linkage disequilibrium - relationship matrix - complex traits - multi-breed - selection - values - markers - heritability - models
Background: Differences in linkage disequilibrium and in allele substitution effects of QTL (quantitative trait loci) may hinder genomic prediction across populations. Our objective was to develop a deterministic formula to estimate the accuracy of across-population genomic prediction, for which reference individuals and selection candidates are from different populations, and to investigate the impact of differences in allele substitution effects across populations and of the number of QTL underlying a trait on the accuracy. Methods: A deterministic formula to estimate the accuracy of across-population genomic prediction was derived based on selection index theory. Moreover, accuracies were deterministically predicted using a formula based on population parameters and empirically calculated using simulated phenotypes and a GBLUP (genomic best linear unbiased prediction) model. Phenotypes of 1033 Holstein-Friesian, 105 Groninger White Headed and 147 Meuse-Rhine-Yssel cows were simulated by sampling 3000, 300, 30 or 3 QTL from the available high-density SNP (single nucleotide polymorphism) information of three chromosomes, assuming a correlation of 1.0, 0.8, 0.6, 0.4, or 0.2 between allele substitution effects across breeds. The simulated heritability was set to 0.95 to resemble the heritability of deregressed proofs of bulls. Results: Accuracies estimated with the deterministic formula based on selection index theory were similar to empirical accuracies for all scenarios, while accuracies predicted with the formula based on population parameters overestimated empirical accuracies by ~25 to 30%. When the between-breed genetic correlation differed from 1, i.e. allele substitution effects differed across breeds, empirical and deterministic accuracies decreased in proportion to the genetic correlation. Using a multi-trait model, it was possible to accurately estimate the genetic correlation between the breeds based on phenotypes and high-density genotypes. The number of QTL underlying the simulated trait did not affect the accuracy. Conclusions: The deterministic formula based on selection index theory estimated the accuracy of across-population genomic predictions well. The deterministic formula using population parameters overestimated the across-population genomic accuracy, but may still be useful because of its simplicity. Both formulas could accommodate for genetic correlations between populations lower than 1. The number of QTL underlying a trait did not affect the accuracy of across-population genomic prediction using a GBLUP method
Genome-wide association study for claw disorders and trimming status in dairy cattle
Spek, D. van der; Arendonk, J.A.M. van; Bovenhuis, H. - \ 2015
Journal of Dairy Science 98 (2015)2. - ISSN 0022-0302 - p. 1286 - 1295.
quantitative trait loci - conformation traits - genetic-parameters - holstein cattle - linkage disequilibrium - body conformation - leg conformation - complex diseases - foot disorders - rare variants
Performing a genome-wide association study (GWAS) might add to a better understanding of the development of claw disorders and the need for trimming. Therefore, the aim of the current study was to perform a GWAS on claw disorders and trimming status and to validate the results for claw disorders based on an independent data set. Data consisted of 20,474 cows with phenotypes for claw disorders and 50,238 cows with phenotypes for trimming status. Recorded claw disorders used in the current study were double sole (DS), interdigital hyperplasia (IH), sole hemorrhage (SH), sole ulcer (SU), white line separation (WLS), a combination of infectious claw disorders consisting of (inter-)digital dermatitis and heel erosion, and a combination of laminitis-related claw disorders (DS, SH, SU, and WLS). Of the cows with phenotypes for claw disorders, 1,771 cows were genotyped and these cow data were used for the GWAS on claw disorders. A SNP was considered significant when the false discovery rate = 0.05 and suggestive when the false discovery rate = 0.20. An independent data set of 185 genotyped bulls having at least 5 daughters with phenotypes (6,824 daughters in total) for claw disorders was used to validate significant and suggestive SNP detected based on the cow data. To analyze the trait “trimming status” (i.e., the need for claw trimming), a data set with 327 genotyped bulls having at least 5 daughters with phenotypes (18,525 daughters in total) was used. Based on the cow data, in total 10 significant and 45 suggestive SNP were detected for claw disorders. The 10 significant SNP were associated with SU, and mainly located on BTA8. The suggestive SNP were associated with DS, IH, SU, and laminitis-related claw disorders. Three of the suggestive SNP were validated in the data set of 185 bulls, and were located on BTA13, BTA14, and BTA17. For infectious claw disorders, SH, and WLS, no significant or suggestive SNP associations were detected. For trimming status, 1 significant and 1 suggestive SNP were detected, both located close to each other on BTA15. Some significant and suggestive SNP were located close to SNP detected in studies on feet and leg conformation traits. Genes with major effects could not be detected and SNP associations were spread across the genome, indicating that many SNP, each explaining a small proportion of the genetic variance, influence claw disorders. Therefore, to reduce the incidence of claw disorders by breeding, genomic selection is a promising approach.
Joint inference of identity by descent along multiple chromosomes from population samples
Zheng, Chaozhi ; Kuhner, Mary K. ; Thompson, Elizabeth A. - \ 2014
Journal of Computational Biology 21 (2014)3. - ISSN 1066-5277 - p. 185 - 200.
Bayesian inference framework - hidden Markov model - latent identity by descent - linkage disequilibrium - reversible jump Markov chain Monte Carlo - shared genome segments
There has been much interest in detecting genomic identity by descent (IBD) segments from modern dense genetic marker data and in using them to identify human disease susceptibility loci. Here we present a novel Bayesian framework using Markov chain Monte Carlo (MCMC) realizations to jointly infer IBD states among multiple individuals not known to be related, together with the allelic typing error rate and the IBD process parameters. The data are phased single nucleotide polymorphism (SNP) haplotypes. We model changes in latent IBD state along homologous chromosomes by a continuous time Markov model having the Ewens sampling formula as its stationary distribution. We show by simulation that this model for the IBD process fits quite well with the coalescent predictions. Using simulation data sets of 40 haplotypes over regions of 1 and 10 million base pairs (Mbp), we show that the jointly estimated IBD states are very close to the true values, although the presence of linkage disequilibrium decreases the accuracy. We also present comparisons with the ibd-haplo program, which estimates IBD among sets of four haplotypes. Our new IBD detection method focuses on the scale between genome-wide methods using simple IBD models and complex coalescent-based methods that are limited to short genome segments. At the scale of a few Mbp, our approach offers potentially more power for fine-scale IBD association mapping.
A genome-wide association study reveals a novel candidate gene for sperm motility in pigs
Diniz, D.B. ; Lopes, M.S. ; Broekhuijse, M.L.W.J. ; Lopes, P.S. ; Harlizius, B. ; Guimaraes, S.E.F. ; Duijvesteijn, N. ; Knol, E.F. ; Silva, F.F. - \ 2014
Animal Reproduction Science 151 (2014)3-4. - ISSN 0378-4320 - p. 201 - 207.
assisted semen analysis - linkage disequilibrium - reproductive traits - fertility - boar - quality - casa - expression - parameters - selection
Sperm motility is one of the most widely used parameters in order to evaluate boar semen quality. However, this trait can only be measured after puberty. Thus, the use of genomic information appears as an appealing alternative to evaluate and improve selection for boar fertility traits earlier in life. With this study we aimed to identify SNPs with significant association with sperm motility in two different commercial pig populations and to identify possible candidate genes within the identified QTL regions. We performed a single-SNP genome-wide association study using genotyped animals from a Landrace-based (L1) and a Large White-based (L2) pig populations. For L1, a total of 602 animals genotyped for 42,551 SNPs were used in the association analysis. For L2, a total of 525 animals genotyped for 40,890 SNPs were available. After the association analysis, a false discovery rate q-value
Genome-wide association mapping for kernel and malting quality traits using hostorical European barley records
Matthies, I.E. ; Malosetti, M. ; Roder, M.S. ; Eeuwijk, F.A. van - \ 2014
PLoS ONE 9 (2014)11. - ISSN 1932-6203 - 15 p.
marker-assisted selection - grain protein-content - hordeum-vulgare l. - doubled-haploid population - different germplasm groups - backcross-qtl analysis - linkage disequilibrium - spring barley - yield components - 2-row barley
Malting quality is an important trait in breeding barley (Hordeum vulgare L.). It requires elaborate, expensive phenotyping, which involves micro-malting experiments. Although there is abundant historical information available for different cultivars in different years and trials, that historical information is not often used in genetic analyses. This study aimed to exploit historical records to assist in identifying genomic regions that affect malting and kernel quality traits in barley. This genome-wide association study utilized information on grain yield and 18 quality traits accumulated over 25 years on 174 European spring and winter barley cultivars combined with diversity array technology markers. Marker-trait associations were tested with a mixed linear model. This model took into account the genetic relatedness between cultivars based on principal components scores obtained from marker information. We detected 140 marker-trait associations. Some of these associations confirmed previously known quantitative trait loci for malting quality (on chromosomes 1H, 2H, and 5H). Other associations were reported for the first time in this study. The genetic correlations between traits are discussed in relation to the chromosomal regions associated with the different traits. This approach is expected to be particularly useful when designing strategies for multiple trait improvements.
Genomic prediction based on data from three layer lines: a comparison between linear methods
Calus, M.P.L. ; Huang, H. ; Vereijken, J. ; Visscher, J. ; Napel, J. ten; Windig, J.J. - \ 2014
Genetics, Selection, Evolution 46 (2014). - ISSN 0999-193X - 13 p.
principal component approach - support vector regression - dairy-cattle breeds - linkage disequilibrium - prior-knowledge - discriminant-analysis - values - selection - accuracy - traits
Background The prediction accuracy of several linear genomic prediction models, which have previously been used for within-line genomic prediction, was evaluated for multi-line genomic prediction. Methods Compared to a conventional BLUP (best linear unbiased prediction) model using pedigree data, we evaluated the following genomic prediction models: genome-enabled BLUP (GBLUP), ridge regression BLUP (RRBLUP), principal component analysis followed by ridge regression (RRPCA), BayesC and Bayesian stochastic search variable selection. Prediction accuracy was measured as the correlation between predicted breeding values and observed phenotypes divided by the square root of the heritability. The data used concerned laying hens with phenotypes for number of eggs in the first production period and known genotypes. The hens were from two closely-related brown layer lines (B1 and B2), and a third distantly-related white layer line (W1). Lines had 1004 to 1023 training animals and 238 to 240 validation animals. Training datasets consisted of animals of either single lines, or a combination of two or all three lines, and had 30 508 to 45 974 segregating single nucleotide polymorphisms. Results Genomic prediction models yielded 0.13 to 0.16 higher accuracies than pedigree-based BLUP. When excluding the line itself from the training dataset, genomic predictions were generally inaccurate. Use of multiple lines marginally improved prediction accuracy for B2 but did not affect or slightly decreased prediction accuracy for B1 and W1. Differences between models were generally small except for RRPCA which gave considerably higher accuracies for B2. Correlations between genomic predictions from different methods were higher than 0.96 for W1 and higher than 0.88 for B1 and B2. The greater differences between methods for B1 and B2 were probably due to the lower accuracy of predictions for B1 (~0.45) and B2 (~0.40) compared to W1 (~0.76). Conclusions Multi-line genomic prediction did not affect or slightly improved prediction accuracy for closely-related lines. For distantly-related lines, multi-line genomic prediction yielded similar or slightly lower accuracies than single-line genomic prediction. Bayesian variable selection and GBLUP generally gave similar accuracies. Overall, RRPCA yielded the greatest accuracies for two lines, suggesting that using PCA helps to alleviate the “n¿«¿p” problem in genomic prediction.
Consequences of splitting whole-genome sequencing effort over multiple breeds on imputation accuracy
Bouwman, A.C. ; Veerkamp, R.F. - \ 2014
BMC Genetics 15 (2014). - ISSN 1471-2156 - 9 p.
genotype imputation - cattle breeds - linkage disequilibrium - phase - populations - jersey - values
The aim of this study was to determine the consequences of splitting sequencing effort over multiple breeds for imputation accuracy from a high-density SNP chip towards whole-genome sequence. Such information would assist for instance numerical smaller cattle breeds, but also pig and chicken breeders, who have to choose wisely how to spend their sequencing efforts over all the breeds or lines they evaluate. Sequence data from cattle breeds was used, because there are currently relatively many individuals from several breeds sequenced within the 1,000 Bull Genomes project. The advantage of whole-genome sequence data is that it carries the causal mutations, but the question is whether it is possible to impute the causal variants accurately. This study therefore focussed on imputation accuracy of variants with low minor allele frequency and breed specific variants. Results Imputation accuracy was assessed for chromosome 1 and 29 as the correlation between observed and imputed genotypes. For chromosome 1, the average imputation accuracy was 0.70 with a reference population of 20 Holstein, and increased to 0.83 when the reference population was increased by including 3 other dairy breeds with 20 animals each. When the same amount of animals from the Holstein breed were added the accuracy improved to 0.88, while adding the 3 other breeds to the reference population of 80 Holstein improved the average imputation accuracy marginally to 0.89. For chromosome 29, the average imputation accuracy was lower. Some variants benefitted from the inclusion of other breeds in the reference population, initially determined by the MAF of the variant in each breed, but even Holstein specific variants did gain imputation accuracy from the multi-breed reference population. Conclusions This study shows that splitting sequencing effort over multiple breeds and combining the reference populations is a good strategy for imputation from high-density SNP panels towards whole-genome sequence when reference populations are small and sequencing effort is limiting. When sequencing effort is limiting and interest lays in multiple breeds or lines this provides imputation of each breed.
Accuracy of imputation to whole-genome sequence data in Holstein Friesian cattle
Binsbergen, R. van; Bink, M.C.A.M. ; Calus, M.P.L. ; Eeuwijk, F.A. van; Hayes, B.J. ; Hulsegge, B. ; Veerkamp, R.F. - \ 2014
Genetics, Selection, Evolution 46 (2014). - ISSN 0999-193X - 25 p.
haplotype-phase inference - genotype imputation - linkage disequilibrium - wide association - breeding programs - genetic-variation - complex traits - population - prediction - design
Background The use of whole-genome sequence data can lead to higher accuracy in genome-wide association studies and genomic predictions. However, to benefit from whole-genome sequence data, a large dataset of sequenced individuals is needed. Imputation from SNP panels, such as the Illumina BovineSNP50 BeadChip and Illumina BovineHD BeadChip, to whole-genome sequence data is an attractive and less expensive approach to obtain whole-genome sequence genotypes for a large number of individuals than sequencing all individuals. Our objective was to investigate accuracy of imputation from lower density SNP panels to whole-genome sequence data in a typical dataset for cattle. Methods Whole-genome sequence data of chromosome 1 (1737 471 SNPs) for 114 Holstein Friesian bulls were used. Beagle software was used for imputation from the BovineSNP50 (3132 SNPs) and BovineHD (40 492 SNPs) beadchips. Accuracy was calculated as the correlation between observed and imputed genotypes and assessed by five-fold cross-validation. Three scenarios S40, S60 and S80 with respectively 40%, 60%, and 80% of the individuals as reference individuals were investigated. Results Mean accuracies of imputation per SNP from the BovineHD panel to sequence data and from the BovineSNP50 panel to sequence data for scenarios S40 and S80 ranged from 0.77 to 0.83 and from 0.37 to 0.46, respectively. Stepwise imputation from the BovineSNP50 to BovineHD panel and then to sequence data for scenario S40 improved accuracy per SNP to 0.65 but it varied considerably between SNPs. Conclusions Accuracy of imputation to whole-genome sequence data was generally high for imputation from the BovineHD beadchip, but was low from the BovineSNP50 beadchip. Stepwise imputation from the BovineSNP50 to the BovineHD beadchip and then to sequence data substantially improved accuracy of imputation. SNPs with a low minor allele frequency were more difficult to impute correctly and the reliability of imputation varied more. Linkage disequilibrium between an imputed SNP and the SNP on the lower density panel, minor allele frequency of the imputed SNP and size of the reference group affected imputation reliability.
Interspecific hybridisation and interaction with cultivars affect the genetic variation of Ulmus minor and U. glabra in Flanders
Cox, K. ; Broek, A. vanden; Mijnsbrugge, K. vander; Buiteveld, J. ; Collin, R.W.J. ; Heybroek, H. ; Mergeay, J. - \ 2014
Tree Genetics and Genomes 10 (2014)4. - ISSN 1614-2942 - p. 813 - 826.
populus x canadensis - linkage disequilibrium - population-genetics - natural hybridization - molecular markers - clonal organisms - elm ulmus - introgression - diversity - plants
Interspecific hybridisation and gene flow from cultivated plants may have profound effects on the evolution of wild species. Considering the cultural history and past use of Ulmus minor and Ulmus glabra trees in Flanders (northern Belgium), we investigated the extent of human impact on the genetic variation of the remaining, supposedly indigenous elm populations. We therefore examined the rate of interspecific hybridisation, which is expected to be higher under human influence, the occurrence of clones within and among locations, the presence of cultivars and their possible offspring. Based on results produced using 385 amplified fragment length polymorphic (AFLP) markers, 46 % of the 106 investigated Flemish elms appeared to be F1 hybrids or backcrosses to one of the parent species, while no F2 hybrids (F1×F1 progeny) were found. Clonality was mainly found among U. minor and hybrids, which are more likely to form root suckers or sprouts as opposed to U. glabra. The majority of the studied locations (76 % of the locations with multiple samples) showed evidence of clonal reproduction. Several, sometimes distant, locations shared a multilocus lineage. We also found indications of gene flow from cultivated elms into native species. It is conceivable that reproductive material has been moved around extensively, obscuring the natural genetic structure of the elm populations. The results help guide the Flemish elm genetic resources conservation programme. Keywords Ulmusminor .Ulmus glabra .Hybridisation . Elm cultivars .Clonal reproduction .Human-mediated disturbance
Identification of agronomically important QTL in tetraploid potato cultivars using a marker-trait association analysis
hoop, B.B. D'; Keizer, L.C.P. ; Paulo, M.J. ; Visser, R.G.F. ; Eeuwijk, F.A. van; Eck, H.J. van - \ 2014
Theoretical and Applied Genetics 127 (2014)3. - ISSN 0040-5752 - p. 731 - 748.
tuberosum subsp tuberosum - late blight resistance - foliage maturity type - beta-vulgaris l. - linkage disequilibrium - solanum-tuberosum - diploid potato - population-structure - quality traits - candidate genes
Two association mapping panels were analysed for marker–trait associations to identify quantitative trait loci (QTL). The first panel comprised 205 historical and contemporary tetraploid potato cultivars that were phenotyped in field trials at two locations with two replicates (the academic panel). The second panel consisted of 299 potato cultivars and included recent breeds obtained from five Dutch potato breeding companies and reference cultivars (the industrial panel). Phenotypic data for the second panel were collected during subsequent clonal selection generations at the individual breeding companies. QTL were identified for 19 agro-morphological and quality traits. Two association mapping models were used: a baseline model without, and a more advanced model with correction for population structure and genetic relatedness. Correction for population structure and genetic relatedness was performed with a kinship matrix estimated from marker information. The detected QTL partly not only confirmed previous studies, e.g. for tuber shape and frying colour, but also new QTL were found like for after baking darkening and enzymatic browning. Pleiotropic effects could be discerned for several QTL.
Conservation genomic analysis of domestic and wild pig populations from the Iberian Peninsula
Herrero-Medrano, J. ; Megens, H.J.W.C. ; Groenen, M. ; Ramis, G. ; Bosse, M. ; Crooijmans, R.P.M.A. - \ 2013
BMC Genetics 14 (2013). - ISSN 1471-2156
linkage disequilibrium - genetic diversity - wide association - genotype data - ancient dna - breeds - size - cattle - microsatellite - homozygosity
Background Inbreeding is among the major concerns in management of local livestock populations. The effective population size of these populations tends to be small, which enhances the risk of fitness reduction and extinction. High-density SNP data make it possible to undertake novel approaches in conservation genetics of endangered breeds and wild populations. A total of 97 representative samples of domestic and wild pig populations from the Iberian Peninsula, subjected to different levels of threat with extinction, were genotyped with a 60 K SNP panel. Data analyses based on: (i) allele frequency differences; (ii) linkage disequilibrium and (iii) runs of homozygosity were integrated to study population relationships, inbreeding and demographic history. Results The domestic pigs analyzed belonged to local Spanish and Portuguese breeds: Iberian - including the variants Retinto Iberian, Negro Iberian and Manchado de Jabugo -, Bisaro and Chato Murciano. The population structure and persistence of phase analysis suggested high genetic relations between Iberian variants, with recent crossbreeding of Manchado de Jabugo with other pig populations. Chato Murciano showed a high frequency of long runs of homozygosity indicating recent inbreeding and reflecting the recent bottleneck reported by historical records. The Chato Murciano and the Manchado de Jabugo breeds presented the lowest effective population sizes in accordance with their status of highly inbred breeds. The Iberian wild boar presented a high frequency of short runs of homozygosity indicating past small population size but no signs of recent inbreeding. The Iberian breed showed higher genetic similarities with Iberian wild boar than the other domestic breeds. Conclusions High-density SNP data provided a consistent overview of population structure, demographic history and inbreeding of minority breeds and wild pig populations from the Iberian Peninsula. Despite the very different background of the populations used, we found a good agreement between the different analyses. Our results are also in agreement with historical reports and provide insight in the events that shaped the current genetic variation of pig populations from the Iberian Peninsula. The results exposed will aid to design and implement strategies for the future management of endangered minority pig breeds and wild populations.
Novel genomic approaches unravel genetic architecture of complex traits in apple.
Kumar, S. ; Garrick, D.J. ; Bink, M.C.A.M. ; Whitworth, C. ; Chagné, D. - \ 2013
BMC Genomics 14 (2013). - ISSN 1471-2164
x domestica borkh. - wide association - mixed-model - linkage disequilibrium - fruit - selection - predictions - accuracy - pedigree - mdmyb10
BACKGROUND: Understanding the genetic architecture of quantitative traits is important for developing genome-based crop improvement methods. Genome-wide association study (GWAS) is a powerful technique for mining novel functional variants. Using a family-based design involving 1,200 apple (Malus × domestica Borkh.) seedlings genotyped for an 8K SNP array, we report the first systematic evaluation of the relative contributions of different genomic regions to various traits related to eating quality and susceptibility to some physiological disorders. Single-SNP analyses models that accounted for population structure, or not, were compared with models fitting all markers simultaneously. The patterns of linkage disequilibrium (LD) were also investigated. RESULTS: A high degree of LD even at longer distances between markers was observed, and the patterns of LD decay were similar across successive generations. Genomic regions were identified, some of which coincided with known candidate genes, with significant effects on various traits. Phenotypic variation explained by the loci identified through a whole-genome scan ranged from 3% to 25% across different traits, while fitting all markers simultaneously generally provided heritability estimates close to those from pedigree-based analysis. Results from 'Q+K' and 'K' models were very similar, suggesting that the SNP-based kinship matrix captures most of the underlying population structure. Correlations between allele substitution effects obtained from single-marker and all-marker analyses were about 0.90 for all traits. Use of SNP-derived realized relationships in linear mixed models provided a better goodness-of-fit than pedigree-based expected relationships. Genomic regions with probable pleiotropic effects were supported by the corresponding higher linkage group (LG) level estimated genetic correlations. CONCLUSIONS: The accuracy of artificial selection in plants species can be increased by using more precise marker-derived estimates of realized coefficients of relationships. All-marker analyses that indirectly account for population- and pedigree structure will be a credible alternative to single-SNP analyses in GWAS. This study revealed large differences in the genetic architecture of apple fruit traits, and the marker-trait associations identified here will help develop genome-based breeding methods for apple cultivar development.
Genome-wide association mapping of frost tolerance in barley (Hordeum vulgare L.)
Visioni, A. ; Tondelli, A. ; Francia, E. ; Pswarayi, A. ; Malosetti, M. ; Russell, J. ; Thomas, W. ; Waugh, R. ; Pecchioni, N. ; Romagosa, I. ; Comadran, J. - \ 2013
BMC Genomics 14 (2013). - ISSN 1471-2164 - 13 p.
low-temperature tolerance - linkage disequilibrium - freezing tolerance - gene family - locus - vernalization - population - winterhardiness - components - adaptation
Background: Frost tolerance is a key trait with economic and agronomic importance in barley because it is a major component of winter hardiness, and therefore limits the geographical distribution of the crop and the effective transfer of quality traits between spring and winter crop types. Three main frost tolerance QTL (Fr-H1, Fr-H2 and Fr-H3) have been identified from bi-parental genetic mapping but it can be argued that those mapping populations only capture a portion of the genetic diversity of the species. A genetically broad dataset consisting of 184 genotypes, representative of the barley gene pool cultivated in the Mediterranean basin over an extended time period, was genotyped with 1536 SNP markers. Frost tolerance phenotype scores were collected from two trial sites, Foradada (Spain) and Fiorenzuola (Italy) and combined with the genotypic data in genome wide association analyses (GWAS) using Eigenstrat and kinship approaches to account for population structure. Results: GWAS analyses identified twelve and seven positive SNP associations at Foradada and Fiorenzuola, respectively, using Eigenstrat and six and four, respectively, using kinship. Linkage disequilibrium analyses of the significant SNP associations showed they are genetically independent. In the kinship analysis, two of the significant SNP associations were tightly linked to the Fr-H2 and HvBmy loci on chromosomes 5H and 4HL, respectively. The other significant kinship associations were located in genomic regions that have not previously been associated with cold stress Conclusions: Haplotype analysis revealed that most of the significant SNP loci are fixed in the winter or facultative types, while they are freely segregating within the un-adapted spring barley genepool. Although there is a major interest in detecting new variation to improve frost tolerance of available winter and facultative types, from a GWAS perspective, working within the un-adapted spring germplasm pool is an attractive alternative strategy which would minimize statistical issues, simplify the interpretation of the data and identify phenology independent genetic determinants of frost tolerance
Association mapping of salt tolerance in barley (Hordeum vulgare L.)
Nguyen Viet Long, L. ; Dolstra, O. ; Malosetti, M. ; Kilian, B. ; Graner, A. ; Visser, R.G.F. ; Linden, C.G. van der - \ 2013
Theoretical and Applied Genetics 126 (2013)9. - ISSN 0040-5752 - p. 2335 - 2351.
genome-wide association - abiotic stress tolerance - quantitative trait loci - linkage disequilibrium - population-structure - salinity stress - expression analysis - ion homeostasis - wild barley - molecular markers
A spring barley collection of 192 genotypes from a wide geographical range was used to identify quantitative trait loci (QTLs) for salt tolerance traits by means of an association mapping approach using a thousand SNP marker set. Linkage disequilibrium (LD) decay was found with marker distances spanning 2–8 cM depending on the methods used to account for population structure and genetic relatedness between genotypes. The association panel showed large variation for traits that were highly heritable under salt stress, including biomass production, chlorophyll content, plant height, tiller number, leaf senescence and shoot Na+, shoot Cl- and shoot, root Na+/K+ contents. The significant correlations between these traits and salt tolerance (defined as the biomass produced under salt stress relative to the biomass produced under control conditions) indicate that these traits contribute to (components of) salt tolerance. Association mapping was performed using several methods to account for population structure and minimize false-positive associations. This resulted in the identification of a number of genomic regions that strongly influenced salt tolerance and ion homeostasis, with a major QTL controlling salt tolerance on chromosome 6H, and a strong QTL for ion contents on chromosome 4H.
Community genetics in the time of next-generation molecular technologies
Gugerli, F. ; Brandl, R. ; Castagneyrol, B. ; Franc, A. ; Jactel, H. ; Koelewijn, H.P. ; Martin, F. ; Peter, M. ; Pritsch, K. ; Schröder, H. ; Smulders, M.J.M. ; Kremer, A. ; Ziegenhagen, B. - \ 2013
Molecular Ecology 22 (2013)12. - ISSN 0962-1083 - p. 3198 - 3207.
linkage disequilibrium - nucleotide diversity - arthropod community - herbivore community - demographic history - ecosystem genetics - emerging synthesis - fungal diversity - laccaria-bicolor - plant genotype
Understanding the interactions of co-occurring species within and across trophic levels provides key information needed for understanding the ecological and evolutionary processes that underlie biological diversity. As genetics has only recently been integrated into the study of community-level interactions, the time is right for a critical evaluation of potential new, gene-based approaches to studying communities. Next-generation molecular techniques, used in parallel with field-based observations and manipulative experiments across spatio-temporal gradients, are key to expanding our understanding of community-level processes. Here, we introduce a variety of ‘-omics’ tools, with recent studies of plant–insect herbivores and of ectomycorrhizal systems providing detailed examples of how next-generation approaches can revolutionize our understanding of interspecific interactions. We suggest ways that novel technologies may convert community genetics from a field that relies on correlative inference to one that reveals causal mechanisms of genetic co-variation and adaptations within communities.