Using selection index theory to estimate consistency of multi-locus linkage disequilibrium across populations
Wientjes, Y.C.J. ; Veerkamp, R.F. ; Calus, M.P.L. - \ 2015
BMC Genetics 16 (2015). - ISSN 1471-2156
genomic breeding values - genetic-relationship information - quantitative trait loci - dairy-cattle breeds - prediction - accuracy - haplotype - markers - impact - lines
The potential of combining multiple populations in genomic prediction is depending on the consistency of linkage disequilibrium (LD) between SNPs and QTL across populations. We investigated consistency of multi-locus LD across populations using selection index theory and investigated the relationship between consistency of multi-locus LD and accuracy of genomic prediction across different simulated scenarios. In the selection index, QTL genotypes were considered as breeding goal traits and SNP genotypes as index traits, based on LD among SNPs and between SNPs and QTL. The consistency of multi-locus LD across populations was computed as the accuracy of predicting QTL genotypes in selection candidates using a selection index derived in the reference population. Different scenarios of within and across population genomic prediction were evaluated, using all SNPs or only the four neighboring SNPs of a simulated QTL. Phenotypes were simulated using different numbers of QTL underlying the trait. The relationship between the calculated consistency of multi-locus LD and accuracy of genomic prediction using a GBLUP type of model was investigated.
The accuracy of predicting QTL genotypes, i.e. the measure describing consistency of multi-locus LD, was much lower for across population scenarios compared to within population scenarios, and was lower when QTL had a low MAF compared to QTL randomly selected from the SNPs. Consistency of multi-locus LD was highly correlated with the realized accuracy of genomic prediction across different scenarios and the correlation was higher when QTL were weighted according to their effects in the selection index instead of weighting QTL equally. By only considering neighboring SNPs of QTL, accuracy of predicting QTL genotypes within population decreased, but it substantially increased the accuracy across populations.
Consistency of multi-locus LD across populations is a characteristic of the properties of the QTL in the investigated populations and can provide more insight in underlying reasons for a low empirical accuracy of across population genomic prediction. By focusing in genomic prediction models only on neighboring SNPs of QTL, multi-locus LD is more consistent across populations since only short-range LD is considered, and accuracy of predicting QTL genotypes of individuals from another population is increased.
Empirical and deterministic accuracies of across-population genomic prediction
Wientjes, Y.C.J. ; Veerkamp, R.F. ; Bijma, P. ; Bovenhuis, H. ; Schrooten, C. ; Calus, M.P.L. - \ 2015
Genetics, Selection, Evolution 47 (2015). - ISSN 0999-193X
dairy-cattle breeds - linkage disequilibrium - relationship matrix - complex traits - multi-breed - selection - values - markers - heritability - models
Background: Differences in linkage disequilibrium and in allele substitution effects of QTL (quantitative trait loci) may hinder genomic prediction across populations. Our objective was to develop a deterministic formula to estimate the accuracy of across-population genomic prediction, for which reference individuals and selection candidates are from different populations, and to investigate the impact of differences in allele substitution effects across populations and of the number of QTL underlying a trait on the accuracy. Methods: A deterministic formula to estimate the accuracy of across-population genomic prediction was derived based on selection index theory. Moreover, accuracies were deterministically predicted using a formula based on population parameters and empirically calculated using simulated phenotypes and a GBLUP (genomic best linear unbiased prediction) model. Phenotypes of 1033 Holstein-Friesian, 105 Groninger White Headed and 147 Meuse-Rhine-Yssel cows were simulated by sampling 3000, 300, 30 or 3 QTL from the available high-density SNP (single nucleotide polymorphism) information of three chromosomes, assuming a correlation of 1.0, 0.8, 0.6, 0.4, or 0.2 between allele substitution effects across breeds. The simulated heritability was set to 0.95 to resemble the heritability of deregressed proofs of bulls. Results: Accuracies estimated with the deterministic formula based on selection index theory were similar to empirical accuracies for all scenarios, while accuracies predicted with the formula based on population parameters overestimated empirical accuracies by ~25 to 30%. When the between-breed genetic correlation differed from 1, i.e. allele substitution effects differed across breeds, empirical and deterministic accuracies decreased in proportion to the genetic correlation. Using a multi-trait model, it was possible to accurately estimate the genetic correlation between the breeds based on phenotypes and high-density genotypes. The number of QTL underlying the simulated trait did not affect the accuracy. Conclusions: The deterministic formula based on selection index theory estimated the accuracy of across-population genomic predictions well. The deterministic formula using population parameters overestimated the across-population genomic accuracy, but may still be useful because of its simplicity. Both formulas could accommodate for genetic correlations between populations lower than 1. The number of QTL underlying a trait did not affect the accuracy of across-population genomic prediction using a GBLUP method
Accuracy of genomic prediction when combining two related crossbred populations
Vallee, A.A.A. ; Arendonk, J.A.M. van; Bovenhuis, H. - \ 2014
Journal of Animal Science 92 (2014)10. - ISSN 0021-8812 - p. 4342 - 4348.
dairy-cattle breeds - beef-cattle - selection - performance - animals - values - uterine - traits - impact - gene
Charolais bulls are selected for their crossbreed performance when mated to Montbéliard or Holstein dams. To implement genomic prediction, one could build a reference population for each crossbred population independently. An alternative could be to combine both crossbred populations into a single reference population to increase size and accuracy of prediction. The objective of this study was to investigate the accuracy of genomic prediction by combining different crossbred populations. Three scenarios were considered: 1) using 1 crossbred population as reference to predict phenotype of animals from the same crossbred population, 2) combining the 2 crossbred populations into 1 reference to predict phenotype of animals from 1 crossbred population, and 3) using 1 crossbred population as reference to predict phenotype of animals from the other crossbred population. Traits studied were bone thinness, height, and muscular development. Phenotypes and 45,117 SNP genotypes were available for 1,764 Montbéliard × Charolais calves and 447 Holstein × Charolais calves. The population was randomly spilt into 10 subgroups, which were assigned to the validation one by one. To allow fair comparison between scenarios, size of the reference population was kept constant for all scenarios. Breeding values were estimated with BLUP and genomic BLUP. Accuracy of prediction was calculated as the correlation between the EBV and the phenotypic values of the calves in the validation divided by the square root of the heritability. Genomic BLUP showed higher accuracies (between 0.281 and 0.473) than BLUP (between 0.197 and 0.452). Accuracies tended to be highest when prediction was within 1 crossbred population, intermediate when populations were combined into the reference population, and lowest when prediction was across populations. Decrease in accuracy from a prediction within 1 population to a prediction across populations was more pronounced for bone thinness (–27%) and height (–29%) than for muscular development (–14%). Genetic correlation between the 2 crossbred populations was estimated using pedigree relationships. It was 0.70 for bone thinness, 0.80 for height, and 0.99 for muscular development. Genetic correlation indicates the expected gain in accuracy of prediction when combining different populations into 1 reference population. The larger the genetic correlation is, the larger the benefit is to combine populations for genomic prediction.
Genomic prediction based on data from three layer lines using non-linear regression models
Huang, H. ; Windig, J.J. ; Vereijken, A. ; Calus, M.P.L. - \ 2014
Genetics, Selection, Evolution 46 (2014). - ISSN 0999-193X - 11 p.
dairy-cattle breeds - dimensionality reduction - gaussian kernel - accuracy - traits - values - validation - selection - pedigree - plant
Background - Most studies on genomic prediction with reference populations that include multiple lines or breeds have used linear models. Data heterogeneity due to using multiple populations may conflict with model assumptions used in linear regression methods. Methods - In an attempt to alleviate potential discrepancies between assumptions of linear models and multi-population data, two types of alternative models were used: (1) a multi-trait genomic best linear unbiased prediction (GBLUP) model that modelled trait by line combinations as separate but correlated traits and (2) non-linear models based on kernel learning. These models were compared to conventional linear models for genomic prediction for two lines of brown layer hens (B1 and B2) and one line of white hens (W1). The three lines each had 1004 to 1023 training and 238 to 240 validation animals. Prediction accuracy was evaluated by estimating the correlation between observed phenotypes and predicted breeding values. Results - When the training dataset included only data from the evaluated line, non-linear models yielded at best a similar accuracy as linear models. In some cases, when adding a distantly related line, the linear models showed a slight decrease in performance, while non-linear models generally showed no change in accuracy. When only information from a closely related line was used for training, linear models and non-linear radial basis function (RBF) kernel models performed similarly. The multi-trait GBLUP model took advantage of the estimated genetic correlations between the lines. Combining linear and non-linear models improved the accuracy of multi-line genomic prediction. Conclusions - Linear models and non-linear RBF models performed very similarly for genomic prediction, despite the expectation that non-linear models could deal better with the heterogeneous multi-population data. This heterogeneity of the data can be overcome by modelling trait by line combinations as separate but correlated traits, which avoids the occasional occurrence of large negative accuracies when the evaluated line was not included in the training dataset. Furthermore, when using a multi-line training dataset, non-linear models provided information on the genotype data that was complementary to the linear models, which indicates that the underlying data distributions of the three studied lines were indeed heterogeneous.
Genomic prediction based on data from three layer lines: a comparison between linear methods
Calus, M.P.L. ; Huang, H. ; Vereijken, J. ; Visscher, J. ; Napel, J. ten; Windig, J.J. - \ 2014
Genetics, Selection, Evolution 46 (2014). - ISSN 0999-193X - 13 p.
principal component approach - support vector regression - dairy-cattle breeds - linkage disequilibrium - prior-knowledge - discriminant-analysis - values - selection - accuracy - traits
Background The prediction accuracy of several linear genomic prediction models, which have previously been used for within-line genomic prediction, was evaluated for multi-line genomic prediction. Methods Compared to a conventional BLUP (best linear unbiased prediction) model using pedigree data, we evaluated the following genomic prediction models: genome-enabled BLUP (GBLUP), ridge regression BLUP (RRBLUP), principal component analysis followed by ridge regression (RRPCA), BayesC and Bayesian stochastic search variable selection. Prediction accuracy was measured as the correlation between predicted breeding values and observed phenotypes divided by the square root of the heritability. The data used concerned laying hens with phenotypes for number of eggs in the first production period and known genotypes. The hens were from two closely-related brown layer lines (B1 and B2), and a third distantly-related white layer line (W1). Lines had 1004 to 1023 training animals and 238 to 240 validation animals. Training datasets consisted of animals of either single lines, or a combination of two or all three lines, and had 30 508 to 45 974 segregating single nucleotide polymorphisms. Results Genomic prediction models yielded 0.13 to 0.16 higher accuracies than pedigree-based BLUP. When excluding the line itself from the training dataset, genomic predictions were generally inaccurate. Use of multiple lines marginally improved prediction accuracy for B2 but did not affect or slightly decreased prediction accuracy for B1 and W1. Differences between models were generally small except for RRPCA which gave considerably higher accuracies for B2. Correlations between genomic predictions from different methods were higher than 0.96 for W1 and higher than 0.88 for B1 and B2. The greater differences between methods for B1 and B2 were probably due to the lower accuracy of predictions for B1 (~0.45) and B2 (~0.40) compared to W1 (~0.76). Conclusions Multi-line genomic prediction did not affect or slightly improved prediction accuracy for closely-related lines. For distantly-related lines, multi-line genomic prediction yielded similar or slightly lower accuracies than single-line genomic prediction. Bayesian variable selection and GBLUP generally gave similar accuracies. Overall, RRPCA yielded the greatest accuracies for two lines, suggesting that using PCA helps to alleviate the “n¿«¿p” problem in genomic prediction.
Genomic associations with somatic cell score in first-lactation Holstein cows
Wijga, S. ; Bastiaansen, J.W.M. ; Wall, E. ; Strandberg, E. ; Haas, Y. de - \ 2012
Journal of Dairy Science 95 (2012)2. - ISSN 0022-0302 - p. 899 - 908.
quantitative trait loci - affecting clinical mastitis - dairy-cattle breeds - 1st 3 lactations - milk-production - genetic-parameters - escherichia-coli - wide association - health traits - count traits
This genome-wide association study aimed to identify loci associated with lactation-average somatic cell score (LASCS) and the standard deviation of test-day somatic cell score (SCS-SD). It is one of the first studies to combine detailed phenotypic and genotypic cow data from research dairy herds located in different countries. The combined data set contained up to 52 individual test-days per lactation and thereby aimed to capture temporary increases in somatic cell score associated with infection. Phenotypic data for analysis consisted of 46,882 test-day records on 1,484 cows, and genotypic data consisted of 37,590 single nucleotide polymorphisms (SNP). Using an animal model, the associations between each individual SNP and the phenotypic data were estimated. To account for the risk of false positives, a false discovery rate threshold of 0.20 was set. The analyses showed that LASCS was significantly associated with a SNP on Bos taurus autosome (BTA) 4 and a SNP on BTA18. Likewise, SCS-SD was associated with this SNP on BTA18. In addition, SCS-SD significantly associated with a SNP on BTA6. Relatively few associations were found, suggesting that LASCS and SCS-SD are controlled by multiple loci distributed across the genome, each with a relatively small effect. Increased knowledge on genetic regulation of LASCS and SCS-SD may aid in identification of genes that play a role in mastitis resistance. Such knowledge helps us understand the genetic mechanisms leading to mastitis and in discovery of targets for mastitis therapeutics.