Using selection index theory to estimate consistency of multi-locus linkage disequilibrium across populations
Wientjes, Y.C.J. ; Veerkamp, R.F. ; Calus, M.P.L. - \ 2015
BMC Genetics 16 (2015). - ISSN 1471-2156
genomic breeding values - genetic-relationship information - quantitative trait loci - dairy-cattle breeds - prediction - accuracy - haplotype - markers - impact - lines
The potential of combining multiple populations in genomic prediction is depending on the consistency of linkage disequilibrium (LD) between SNPs and QTL across populations. We investigated consistency of multi-locus LD across populations using selection index theory and investigated the relationship between consistency of multi-locus LD and accuracy of genomic prediction across different simulated scenarios. In the selection index, QTL genotypes were considered as breeding goal traits and SNP genotypes as index traits, based on LD among SNPs and between SNPs and QTL. The consistency of multi-locus LD across populations was computed as the accuracy of predicting QTL genotypes in selection candidates using a selection index derived in the reference population. Different scenarios of within and across population genomic prediction were evaluated, using all SNPs or only the four neighboring SNPs of a simulated QTL. Phenotypes were simulated using different numbers of QTL underlying the trait. The relationship between the calculated consistency of multi-locus LD and accuracy of genomic prediction using a GBLUP type of model was investigated.
The accuracy of predicting QTL genotypes, i.e. the measure describing consistency of multi-locus LD, was much lower for across population scenarios compared to within population scenarios, and was lower when QTL had a low MAF compared to QTL randomly selected from the SNPs. Consistency of multi-locus LD was highly correlated with the realized accuracy of genomic prediction across different scenarios and the correlation was higher when QTL were weighted according to their effects in the selection index instead of weighting QTL equally. By only considering neighboring SNPs of QTL, accuracy of predicting QTL genotypes within population decreased, but it substantially increased the accuracy across populations.
Consistency of multi-locus LD across populations is a characteristic of the properties of the QTL in the investigated populations and can provide more insight in underlying reasons for a low empirical accuracy of across population genomic prediction. By focusing in genomic prediction models only on neighboring SNPs of QTL, multi-locus LD is more consistent across populations since only short-range LD is considered, and accuracy of predicting QTL genotypes of individuals from another population is increased.
Whole-genome regression and prediction methods applied to plant and animal breeding
Los Campos, G. De; Hickey, J.M. ; Pong-Wong, R. ; Daetwyler, H.D. ; Calus, M.P.L. - \ 2013
Genetics 193 (2013)2. - ISSN 0016-6731 - p. 327 - 345.
marker-assisted selection - quantitative trait locus - genetic-relationship information - single nucleotide polymorphisms - linear unbiased prediction - dense molecular markers - dairy-cattle - variable selection - reference population - beef-cattle
Genomic-enabled prediction is becoming increasingly important in animal and plant breeding, and is also receiving attention in human genetics. Deriving accurate predictions of complex traits requires implementing whole-genome regression (WGR) models where phenotypes are regressed on thousands of markers concurrently. Following the groundbreaking contribution of MEUWISSEN et al. (2001) several methods have been proposed and evaluated, and genome-enabled selection (GS) is being implemented in several plant and animal breeding programs. The list of methods is long, and the relationships between the available methods have not been fully addressed. In this article we provide an overview of available methods for implementing parametric WGR models, discuss selected topics which emerge in the application of these methods and present a general discussion of lessons learnt from simulation and empirical data analysis in the last decade
Genomic prediction in animals and plants: simulation of data, validation, reporting, and benchmarking
Daetwyler, H.D. ; Calus, M.P.L. ; Pong-Wong, R. ; Los Campos, G. De; Hickey, J.M. - \ 2013
Genetics 193 (2013)2. - ISSN 0016-6731 - p. 347 - 365.
genetic-relationship information - breeding value prediction - snp genotyping assay - dairy-cattle - linkage disequilibrium - quantitative traits - population-genetics - wide association - coalescent simulation - ancestral processes
The genomic prediction of phenotypes and breeding values in animals and plants has developed rapidly into its own research field. Results of genomic prediction studies are often difficult to compare because data simulation varies, real or simulated data are not fully described, and not all relevant results are reported. In addition, some new methods have been compared only in limited genetic architectures, leading to potentially misleading conclusions. In this article we review simulation procedures, discuss validation and reporting of results, and apply benchmark procedures for a variety of genomic prediction methods in simulated and real example data. Plant and animal breeding programs are being transformed by the use of genomic data, which are becoming widely available and cost-effective to predict genetic merit. A large number of genomic prediction studies have been published using both simulated and real data. The relative novelty of this area of research has made the development of scientific conventions difficult with regard to description of the real data, simulation of genomes, validation and reporting of results, and forward in time methods. In this review article we discuss the generation of simulated genotype and phenotype data, using approaches such as the coalescent and forward in time simulation. We outline ways to validate simulated data and genomic prediction results, including cross-validation. The accuracy and bias of genomic prediction are highlighted as performance indicators that should be reported. We suggest that a measure of relatedness between the reference and validation individuals be reported, as its impact on the accuracy of genomic prediction is substantial. A large number of methods were compared in example simulated and real (pine and wheat) data sets, all of which are publicly available. In our limited simulations, most methods performed similarly in traits with a large number of quantitative trait loci (QTL), whereas in traits with fewer QTL variable selection did have some advantages. In the real data sets examined here all methods had very similar accuracies. We conclude that no single method can serve as a benchmark for genomic prediction. We recommend comparing accuracy and bias of new methods to results from genomic best linear prediction and a variable selection approach (e.g., BayesB), because, together, these methods are appropriate for a range of genetic architectures. An accompanying article in this issue provides a comprehensive review of genomic prediction methods and discusses a selection of topics related to application of genomic prediction in plants and animals
The effect of linkage disequilibrium and family relationships on the reliability of genomic prediction
Wientjes, Y.C.J. ; Veerkamp, R.F. ; Calus, M.P.L. - \ 2013
Genetics 193 (2013)2. - ISSN 0016-6731 - p. 621 - 631.
genetic-relationship information - chromosome substitution strains - breeding values - dairy-cows - pedigree information - relationship matrix - angus cattle - selection - accuracy - complex
Although the concept of genomic selection relies on linkage disequilibrium (LD) between quantitative trait loci and markers, reliability of genomic predictions is strongly influenced by family relationships. In this study, we investigated the effects of LD and family relationships on reliability of genomic predictions and the potential of deterministic formulas to predict reliability using population parameters in populations with complex family structures. Five groups of selection candidates were simulated by taking different information sources from the reference population into account: (1) allele frequencies, (2) LD pattern, (3) haplotypes, (4) haploid chromosomes, and (5) individuals from the reference population, thereby having real family relationships with reference individuals. Reliabilities were predicted using genomic relationships among 529 reference individuals and their relationships with selection candidates and with a deterministic formula where the number of effective chromosome segments (Me) was estimated based on genomic and additive relationship matrices for each scenario. At a heritability of 0.6, reliabilities based on genomic relationships were 0.002 ± 0.0001 (allele frequencies), 0.022 ± 0.001 (LD pattern), 0.018 ± 0.001 (haplotypes), 0.100 ± 0.008 (haploid chromosomes), and 0.318 ± 0.077 (family relationships). At a heritability of 0.1, relative differences among groups were similar. For all scenarios, reliabilities were similar to predictions with a deterministic formula using estimated Me. So, reliabilities can be predicted accurately using empirically estimated Me and level of relationship with reference individuals has a much higher effect on the reliability than linkage disequilibrium per se. Furthermore, accumulated length of shared haplotypes is more important in determining the reliability of genomic prediction than the individual shared haplotype length
Predicted accuracy of and response to genomic selection for new traits in dairy cattle
Calus, M.P.L. ; Haas, Y. de; Pszczola, M.J. ; Veerkamp, R.F. - \ 2013
Animal 7 (2013)2. - ISSN 1751-7311 - p. 183 - 191.
genetic-relationship information - breeding programs - holstein cattle - energy-balance - strategies - emissions - progress - schemes - designs - impact
Genomic selection relaxes the requirement of traditional selection tools to have phenotypic measurements on close relatives of all selection candidates. This opens up possibilities to select for traits that are difficult or expensive to measure. The objectives of this paper were to predict accuracy of and response to genomic selection for a new trait, considering that only a cow reference population of moderate size was available for the new trait, and that selection simultaneously targeted an index and this new trait. Accuracy for and response to selection were deterministically evaluated for three different breeding goals. Single trait selection for the new trait based only on a limited cow reference population of up to 10 000 cows, showed that maximum genetic responses of 0.20 and 0.28 genetic standard deviation (s.d.) per year can be achieved for traits with a heritability of 0.05 and 0.30, respectively. Adding information from the index based on a reference population of 5000 bulls, and assuming a genetic correlation of 0.5, increased genetic response for both heritability levels by up to 0.14 genetic s.d. per year. The scenario with simultaneous selection for the new trait and the index, yielded a substantially lower response for the new trait, especially when the genetic correlation with the index was negative. Despite the lower response for the index, whenever the new trait had considerable economic value, including the cow reference population considerably improved the genetic response for the new trait. For scenarios with a zero or negative genetic correlation with the index and equal economic value for the index and the new trait, a reference population of 2000 cows increased genetic response for the new trait with at least 0.10 and 0.20 genetic s.d. per year, for heritability levels of 0.05 and 0.30, respectively. We conclude that for new traits with a very small or positive genetic correlation with the index, and a high positive economic value, considerable genetic response can already be achieved based on a cow reference population with only 2000 records, even when the reliability of individual genomic breeding values is much lower than currently accepted in dairy cattle breeding programs. New traits may generally have a negative genetic correlation with the index and a small positive economic value. For such new traits, cow reference populations of at least 10 000 cows may be required to achieve acceptable levels of genetic response for the new trait and for the whole breeding goal.
Genomic Selection for Fruit Quality Traits in Apple (Malus x domestica Borkh.)
Kumar, S. ; Chagné, D. ; Bink, M.C.A.M. ; Volz, R.K. ; Whitworth, C. ; Carlisle, C. - \ 2012
PLoS ONE 7 (2012)5. - ISSN 1932-6203
genetic-relationship information - estimated breeding value - linkage disequilibrium - status number - accuracy - values - prediction - markers - cattle - parameters
The genome sequence of apple (Malus×domestica Borkh.) was published more than a year ago, which helped develop an 8K SNP chip to assist in implementing genomic selection (GS). In apple breeding programmes, GS can be used to obtain genomic breeding values (GEBV) for choosing next-generation parents or selections for further testing as potential commercial cultivars at a very early stage. Thus GS has the potential to accelerate breeding efficiency significantly because of decreased generation interval or increased selection intensity. We evaluated the accuracy of GS in a population of 1120 seedlings generated from a factorial mating design of four females and two male parents. All seedlings were genotyped using an Illumina Infinium chip comprising 8,000 single nucleotide polymorphisms (SNPs), and were phenotyped for various fruit quality traits. Random-regression best liner unbiased prediction (RR-BLUP) and the Bayesian LASSO method were used to obtain GEBV, and compared using a cross-validation approach for their accuracy to predict unobserved BLUP-BV. Accuracies were very similar for both methods, varying from 0.70 to 0.90 for various fruit quality traits. The selection response per unit time using GS compared with the traditional BLUP-based selection were very high (>100%) especially for low-heritability traits. Genome-wide average estimated linkage disequilibrium (LD) between adjacent SNPs was 0.32, with a relatively slow decay of LD in the long range (r2 = 0.33 and 0.19 at 100 kb and 1,000 kb respectively), contributing to the higher accuracy of GS. Distribution of estimated SNP effects revealed involvement of large effect genes with likely pleiotropic effects. These results demonstrated that genomic selection is a credible alternative to conventional selection for fruit quality traits.
Reliability of direct genomic values for animals with different relationships within and to the reference population
Pszczola, M.J. ; Strabel, T. ; Mulder, H.A. ; Calus, M.P.L. - \ 2012
Journal of Dairy Science 95 (2012)1. - ISSN 0022-0302 - p. 389 - 400.
quantitative trait loci - genetic-relationship information - estimated breeding values - dairy-cattle - linkage disequilibrium - holstein population - selection - accuracy - association - predictions
Accuracy of genomic selection depends on the accuracy of prediction of single nucleotide polymorphism effects and the proportion of genetic variance explained by markers. Design of the reference population with respect to its family structure may influence the accuracy of genomic selection. The objective of this study was to investigate the effect of various relationship levels within the reference population and different level of relationship of evaluated animals to the reference population on the reliability of direct genomic breeding values (DGV). The DGV reliabilities, expressed as squared correlation between estimated and true breeding value, were calculated for evaluated animals at 3 heritability levels. To emulate a trait that is difficult or expensive to measure, such as methane emission, reference populations were kept small and consisted of females with own performance records. A population reflecting a dairy cattle population structure was simulated. Four chosen reference populations consisted of all females available in the first genotyped generation. They consisted of highly (HR), moderately (MR), or lowly (LR) related animals, by selecting paternal half-sib families of decreasing size, or consisted of randomly chosen animals (RND). Of those 4 reference populations, RND had the lowest average relationship. Three sets of evaluated animals were chosen from 3 consecutive generations of genotyped animals, starting from the same generation as the reference population. Reliabilities of DGV predictions were calculated deterministically using selection index theory. The randomly chosen reference population had the lowest average relationship within the reference population. Average reliabilities increased when average relationship within the reference population decreased and the highest average reliabilities were achieved for RND (e.g., from 0.53 in HR to 0.61 in RND for a heritability of 0.30). A higher relationship to the reference population resulted in higher reliability values. At the average squared relationship of evaluated animals to the reference population of 0.005, reliabilities were, on average, 0.49 (HR) and 0.63 (RND) for a heritability of 0.30; 0.20 (HR) and 0.27 (RND) for a heritability of 0.05; and 0.07 (HR) and 0.09 (RND) for a heritability of 0.01. Substantial decrease in the reliability was observed when the number of generations to the reference population increased [e.g., for heritability of 0.30, the decrease from evaluated set I (chosen from the same generation as the reference population) to II (one generation younger than the reference population) was 0.04 for HR, and 0.07 for RND]. In this study, the importance of the design of a reference population consisting of cows was shown and optimal designs of the reference population for genomic prediction were suggested.
Best Linear Unbiased Prediction of Genomic Breeding Values Using a Trait-Specific Marker-Derived Relationship Matrix
Zhe Zhang, Z. ; Liu, J.F. ; Ding, Z. ; Bijma, P. ; Koning, D.J. de - \ 2010
PLoS ONE 5 (2010)9. - ISSN 1932-6203 - 8 p.
genetic-relationship information - dairy-cattle - wide selection - accuracy - populations - programs - animals - impact - snp
With the availability of high density whole-genome single nucleotide polymorphism chips, genomic selection has become a promising method to estimate genetic merit with potentially high accuracy for animal, plant and aquaculture species of economic importance. With markers covering the entire genome, genetic merit of genotyped individuals can be predicted directly within the framework of mixed model equations, by using a matrix of relationships among individuals that is derived from the markers. Here we extend that approach by deriving a marker-based relationship matrix specifically for the trait of interest.