Exploiting whole genome sequence variants in cattle breeding : Unraveling the distribution of genetic variants and role of rare variants in genomic evaluation
Zhang, Qianqian - \ 2017
Wageningen University. Promotor(en): H. Bovenhuis; M.S. Lund, co-promotor(en): G. Sahana; M. Calus; B. Guldbrandtsen. - Wageningen : Wageningen University - ISBN 9788793643147 - 249
cattle - genomes - genetic variation - inbreeding - homozygosity - longevity - quantitative traits - animal breeding - animal genetics - rundvee - genomen - genetische variatie - inteelt - homozygotie - gebruiksduur - kwantitatieve kenmerken - dierveredeling - diergenetica
The availability of whole genome sequence data enables to better explore the genetic mechanisms underlying different quantitative traits that are targeted in animal breeding. This thesis presents different strategies and perspectives on utilization of whole genome sequence variants in cattle breeding. Using whole genome sequence variants, I show the genetic variation, recent and ancient inbreeding, and genome-wide pattern of introgression across the demographic and breeding history in different cattle populations. Using the latest genomic tools, I demonstrate that recent inbreeding can accurately be estimated by runs of homozygosity (ROH). This can further be utilized in breeding programs to control inbreeding in breeding programs. In chapter 2 and 4, by in-depth genomic analysis on whole genome sequence data, I demonstrate that the distribution of functional genetic variants in ROH regions and introgressed haplotypes was shaped by recent selective breeding in cattle populations. The contribution of whole genome sequence variants to the phenotypic variation partly depends on their allele frequencies. Common variants associated with different traits have been identified and explain a considerable proportion of the genetic variance. For example, common variants from whole genome sequence associated with longevity have been identified in chapter 5. However, the identified common variants cannot explain the full genetic variance, and rare variants might play an important role here. Rare variants may account for a large proportion of the whole genome sequence variants, but are often ignored in genomic evaluation, partly because of difficulty to identify associations between rare variants and phenotypes. I compared the powers of different gene-based association mapping methods that combine the rare variants within a gene using a simulation study. Those gene- based methods had a higher power for mapping rare variants compared with mixed linear models applying single marker tests that are commonly used for common variants. Moreover, I explored the role of rare and low-frequency variants in the variation of different complex traits and their impact on genomic prediction reliability. Rare and low-frequency variants contributed relatively more to variation for health-related traits than production traits, reflecting the potential of improving prediction reliability using rare and low-frequency variants for health-related traits. However, in practice, only marginal improvement was observed using selected rare and low-frequency variants when combined with 50k SNP genotype data on the reliability of genomic prediction for fertility, longevity and health traits. A simulation study did show that reliability of genomic prediction could be improved provided that causal rare and low-frequency variants affecting a trait are known.
Comparative genomics and trait evolution in Cleomaceae, a model family for ancient polyploidy
Bergh, Erik van den - \ 2017
Wageningen University. Promotor(en): M.E. Schranz; Y. van de Peer. - Wageningen : Wageningen University - ISBN 9789463431705 - 106
capparaceae - genomics - polyploidy - evolution - genomes - reproductive traits - flowers - colour - glucosinolates - genetic variation - biosystematics - taxonomy - identification - capparaceae - genomica - polyploïdie - evolutie - genomen - voortplantingskenmerken - bloemen - kleur - glucosinolaten - genetische variatie - biosystematiek - taxonomie - identificatie
As more and more species have been sequenced, evidence has been piling up for a fascinating phenomenon that seems to occur in all plant lineages: paleopolyploidy. Polyploidy has historically been a much observed and studied trait, but until recently it was assumed that polyploids were evolutionary dead-ends due to their sterility. However, many studies since the 1990’s have challenged this notion by finding evidence for ancient genome duplications in many genomes of current species. This lead to the observation that all seed plants share at least one ancestral polyploidy event. Another polyploidy event has been proven to lie at the base of all angiosperms, further signifying the notion that ancient polyploidy is widespread and common. These findings have led to questions regarding the apparent disadvantages that can be observed in a first generation polyploid. If these disadvantages can be overcome however, duplication of a genome also presents an enormous potential for evolutionary novelty. Duplicated copies of genes are able to acquire changes that can lead to specialization of the duplicated pair into two functions (subfunctionalization) or the development of one copy towards an entirely new function (neofunctionalization).
Currently, most research towards polyploidy has focused on the economically and scientifically important Brassicaceae family containing the model plant Arabidopsis thaliana and many crops such as cabbage, rapeseed, broccoli and turnip. In this thesis, I lay the foundations for the expansion of this scope to the Cleomaceae, a widespread cosmopolitan plant family and a sister family of Brassicaceae. The species within Cleomaceae are diverse and exhibit many scientifically interesting traits. They are also in a perfect position phylogenetically to draw comparisons with the much more studied Brassicaceae. I describe the Cleomaceae and their relevance to polyploid research in more detail in the Introduction. I then describe the important first step towards setting up the genetic framework of this family with the sequencing of Tarenaya hassleriana in Chapter 1.
In Chapter 2, I have studied the effects of polyploidy on the development of C4 photosynthesis by comparing the transcriptome of C3 photosynthesis based species Tarenaya hassleriana with the C4 based Gynandropsis gynandra. C4 photosynthesis is an elaboration of the more common C3 form of photosynthesis that concentrates CO2 in specific cells leading to decreased photorespiration by the RuBisCO and higher photosynthetic efficieny in low CO2 environments. I find that polyploidy has not led to sub- or neofunctionalization towards the development of this trait, but instead find evidence for another important phenomenon in postpolyploid evolution: the dosage balance hypothesis. This hypothesis states that genes which are dependent on specific dosage levels of their products will be maintained in duplicate; any change in their function would lead to dosage imbalance which would have deleterious effects on their pathway. We show that most genes involved in photosynthesis have returned to single copy in G. gynandra and that the changes leading to C4 have mostly taken place at the expression level confirming current assumptions on the development of this trait.
In Chapter 3, I have studied the effects of polyploidy on an important class of plant defence compounds: glucosinolates. These compounds, sometimes referred to as ‘mustard oils’, play an important role in the defence against herbivores and have radiated widely in Brassicaceae to form many different ‘flavors’ to deter specific herbivores. I show that in Cleomaceae many genes responsible for these compounds have benefited from the three rounds of polyploidy that T. hassleriana has undergone and that many duplicated genes have been retained. We also show that more than 75% is actively expressed in the plant, proving that the majority of these duplications has an active function in the plant.
Finally, in Chapter 4 I investigate a simple observation made during experiments with T. hassleriana in the greenhouse regarding the variation in flower colour between different individuals: some had pink flowers and some purple. Using LC-PDA mass spectrometry we find that the two colours are caused by different levels of two anthocyanin pigments, with cyanidin dominating in the purple flowers and pelargonidin being more abundant in pink flowers. Through sequence comparison and synteny analysis between A. thaliana and T. hassleriana we find the orthologs of the genes involved in this pathway. Using a Genotyping by Sequencing method on a cross between these two flower colours, we produce a collection of SNP markers on the reference genome. With these SNPs, we find two significant binary trait loci, one of which corresponds to the location of the F3’H ortholog which performs the conversion of a pelargonidin precursor to a cyanidin precursor.
In the General Conclusion, I combine all findings of the previous chapters and explain how they establish part of a larger species framework to study ancient polyploidy in angiosperms. I then put forth what these findings can mean for possible future research and the directions that are worth to be explored further.
From species to trait evolution in Aethionema (Brassicaceae)
Mohammadin, Setareh - \ 2017
Wageningen University. Promotor(en): M.E. Schranz. - Wageningen : Wageningen University - ISBN 9789463431385 - 125
brassicaceae - evolution - rna - genomes - genetic diversity - phytogeography - glucosinolates - quantitative trait loci - next generation sequencing - brassicaceae - evolutie - rna - genomen - genetische diversiteit - plantengeografie - glucosinolaten - loci voor kwantitatief kenmerk - next generation sequencing
The plant family Brassicaceae (or crucifers) is an economically important group that includes many food crops (e.g. cabbages and radishes), horticultural species (e.g. Draba, Iberis, Lunaria), and model plant species (particularly Arabidopsis thaliana). Because of the fundamental importance of A. thaliana to plant biology, it makes the Brassicaceae an ideal system for comparative genomics and to test wider evolutionary, ecological and speciation hypotheses. One such hypothesis is the ‘Whole Genome Duplication Radiation Lag Time’ (WGD-RLT) model for the role of polyploidy on the evolution of important plant families such as the Brassicaceae. The WGD-RLT model indicates a higher rate of diversification of a core-group compared to its sister group, due to a lag time after a whole genome duplication event that made it possible for novel traits or geo- or ecological events to increase the core groups diversification rate.
Aethionema is the species-poor sister genus of the core Brassicaceae and hence is at an important comparative position to analyse trait and genomic evolution of the species-rich core group. Aethionema species occur mainly in the western Irano-Turanian region, which is concordantly the biodiversity hotspot of the Brassicaceae family. Moreover comparing Aethionema to the Brassicaceae core group can help us to understand and test the ‘WGD-RLT’ model. However to be able to do so we first need to know more about Aethionema. In this thesis, I investigated various levels of evolutionary change (from macro, to micro to trait evolution) within the genus Aethionema, with a major focus the emerging model species Aethionema arabicum.
Next generation sequencing has made it possible to use the genomes of many species in a comparative framework. However, the formation of proteins and enzymes, and in the end the phenotype of the whole plant, relies on transcription from particular regions of the genome including genes. Hence, the transcriptome makes it possible to assess the functional parts of the genome. However, the functional part of the genome not only relies on the protein coding genes. Gene regulatory elements like promoters and long non-coding RNAs function as regulators of gene expression and hence are involved in increasing or decreasing transcription. In Chapter 2 I used the transcriptome of four different Aethionema species to understand the lineage specificity of these long non-coding RNAs. Moreover in a comparison with the Brassicaceae core group and Brassicaceae’s sister family the Cleomaceae I show that although the position of long non-coding RNAs can be conserved, their sequences do not have to be.
Most of the Aethionema species occur in the Irano-Turanian region, a politically instable region, making it hard for scientist to collect from. However the natural history collections made throughout the last centuries are a great resource. Combing these collections with the newest sequencing techniques, e.g. next generation sequencing, have allowed me to infer the phylogeny of ~75% of the known Aethionema species in a time calibrated and historical biogeographical framework. Hence, I was able to establish that Aethionema species likely originated from the Anatolian Diagonal and that major geological events like the uplift of the Turkish and Iranian plateau have had a hand in their speciation (Chapter 3).
To examine species-level processes I sequenced and analysed transcriptomes of eight Ae. arabicum accessions coming from Cyprus, Iran and Turkey to investigate population structure, genetic diversity and local adaptation (Chapter 4). The most prominent finding was a ploidy difference between the Iranian and Turkish/Cypriotic lines, whereby the former were (allo)tetraploid and the latter diploid. The tetraploid Iranian lines seem to have one set of alleles from the Turkish/Cypriotic gene-pool. However we do not know where the other alleles come from. In addition to the differences in ploidy level there are also differences in glucosinolate defence compounds between these two populations (Iranian vs Turkish/Cypriotic), with the Iranian lines lacking the diversity and concentration of indolic glucosinolates that the Turkish/Cypriotic lines have. This chapter serves as a good resource and starting point for future research in the region, maybe by using the natural history collections that are at hand.
Glucosinolates (i.e. mustard oils) are mainly made by Brassicales species, with their highest structural diversity in the Brassicaceae. In Chapter 5, I examined two Ae. arabicum lines (CYP and TUR) and their recombinant inbred lines to assess glucosinolate composition in different tissues and throughout the plants development. The levels of glucosinolates in the leaves changed when Ae. arabicum went from vegetative to a reproductive state. Moreover, a major difference in glucosinolate content (up to 10-fold) between CYP and TUR indicates a likely regulatory pathway outside of the main glucosinolate biosynthesis pathway. Multi-trait and multi-environment QTL analyses based on leaves, reproductive tissues and seeds identified a single major QTL. Fine mapping this region reduced the interval to only fifteen protein coding genes, including the two most intriguing candidates: FLOWERING LOCUS C (FLC) and the sulphate transporter SULTR2;1. These findings show an interesting correlation between development and defence.
Finally, Chapter 6 gives a final discussion of this thesis and its results. It brings the different topics together, put them in a bigger picture and look forward to new research possibilities.
The transcriptome as early marker of diet-related health : evidence in energy restriction studies in humans
Bussel, Inge P.G. van - \ 2017
Wageningen University. Promotor(en): Sander Kersten, co-promotor(en): Lydia Afman. - Wageningen : Wageningen University - ISBN 9789463430678 - 194
energy restricted diets - energy intake - gene expression - genomes - proteins - endurance - food composition - human nutrition research - energiearme diëten - energieopname - genexpressie - genomen - eiwitten - uithoudingsvermogen - voedselsamenstelling - voedingsonderzoek bij de mens
Background: Nutrition research is facing several challenges with respect to finding diet related health effects. The effects of nutrition on health are subtle, show high interindividual variations in response, and can take long before they become visual. Recently, the definition of health has been redefined as an organism’s ability to adapt to challenges and ‘this definition’ can be extended to metabolic health. In the metabolic context the ability to adapt has been named ‘phenotypic flexibility’. A potential new tool to magnify the effects of diet on health is the application of challenge tests. Combined with a comprehensive tool such as transcriptomics, the study of challenge tests before and after an intervention might be able to test a change in phenotypic flexibility. A dietary intervention well-known to improve health through weight loss is energy restriction (ER). ER can be used as a model to examine the potential of challenge tests in combination with transcriptomics to magnify diet-induced effects on health. As opposed to ER, caloric restriction (CR) is a reduction in energy intake aimed at improving health and life span in non-obese subjects and not directly aimed at weight loss. In this thesis, we aimed to investigate the use of the transcriptome as an early and sensitive marker of diet-related health.
Methods: First we studied the consequences of age on the effects of CR on the peripheral blood mononuclear cells (PBMCs) transcriptome. For that purpose, we compared the changes in gene expression in PBMCs from old men with the changes in gene expression in PBMCs from young men upon three weeks of 30% CR. To study the effect of a change in dietary composition during ER, we compared the changes in gene expression upon a 12 weeks high protein 25% ER diet with the changes in gene expression upon a 12 weeks normal protein 25% ER diet in white adipose tissue (WAT). Next, we investigated the added value of measuring the PBMC transcriptome during challenge tests compared to measuring the PBMC transcriptome in the fasted state to magnify the effects of ER on health. This was investigated by measuring the changes in gene expression upon an oral glucose tolerance test (OGTT) and upon a mixed meal test (MMT), both before and after 12 weeks of 20% ER. Finally, we determined the differences between a challenge test consisting of glucose alone, the OGTT, or consisting of glucose plus other macronutrients, the MMT, on the PBMC transcriptome in diet-related health.
Results: We observed that the transcriptome of PBMCs of healthy young men had a higher responsiveness in immune response pathways compared to the transcriptome of PBMCs of aged men upon CR (chapter 2). Also, we showed that upon a normal protein-ER diet the transcriptome of WAT showed a decrease in pathways involved in immune response and inflammasome, whereas no such effect was found upon a high protein-ER diet. These effect were observed while parameters such as weight loss, glucose, and waist circumference did not change due to the different protein quantities (chapter 3). 12 weeks of 20% ER was shown to increase phenotypic flexibility as reflected by a faster and more pronounced downregulation of OXPHOS, cell adhesion, and DNA replication during the OGTT compared to the control diet (chapter 4). Finally, two challenge tests consisting of either glucose (OGTT) or glucose plus fat and protein (MMT), were shown to result in a larger overlap than difference in the changes in gene expression of PBMCs (chapter 5).
Conclusions: Based on the differential changes in gene expression upon CR at different ages, we concluded that age is an important modulator in the response to CR. As a high protein ER diet induced transcriptional changes seemed to reflect less beneficial health effects than a normal protein ER diet we concluded that the diet composition is important in the health-effect of ER as measured by the transcriptome. Based on the faster PBMCs changes in gene expression during an OGTT upon 12 weeks of 20% ER, we concluded that the PBMC transcriptome combined with a challenge test can reflect changes in phenotypic flexibility. This makes challenge tests a suitable tool to study diet-related health effects. Finally, based on the changes in gene expression of the MMT and OGTT, we conclude that glucose in a challenge test is the main denominator of the postprandial changes in gene expression in the first two hours. Overall, these results lead to the conclusion that the transcriptome, especially in combination with challenges test, can be used as an early marker of diet-related health. The direct relation to health still needs to be investigated, but the possibility to use the transcriptome as an early marker of diet-related health gives rise to a better understanding of the effects of nutrition on health.
Selection for pure- and crossbred performance in Charolais
Vallée-Dassonneville, Amélie - \ 2017
Wageningen University. Promotor(en): Johan van Arendonk; Henk Bovenhuis. - Wageningen : Wageningen University - ISBN 9789463430180 - 151
charolais - cattle - animal breeding - crossbreeding - crossbreds - selection - beef cattle - genomes - genetic parameters - charolais - rundvee - dierveredeling - kruisingsfokkerij - kruising - selectie - vleesvee - genomen - genetische parameters
Two categories of beef production exist; i.e. (i) purebred animals from a beef sire and a beef dam and (ii) crossbred animals from a beef sire and a dairy dam.
For the purebred beef production, there is a growing interest to include behavior and type traits in the breeding goal. Heritabilities for behavior traits, estimated using subjective data scored by farmers, range from 0.02 to 0.19. Heritabilities for type traits range from 0.02 to 0.35. Results show that there are good opportunities to implement selection for behavior traits using a simple on-farm recording system to allow collection of large data set, and for type traits in Charolais. A genome-wide association study detected 16 genomic regions with small effect on behavior and type traits. This suggests that behavior and type traits are influenced by many genes each explaining a small part of the genetic variance.
The two main dairy breeds mated to Charolais sires for crossbred beef production in France are Montbéliard and Holstein. The genetic correlation between the same trait measured on Montbéliard x Charolais and on Holstein x Charolais was 0.99 for muscular development, 0.96 for birth weight; and 0.91 for calving difficulty, 0.80 for height, and 0.70 for bone thinness. Thus, for these last three traits, results show evidence for re-ranking of Charolais sires depending on whether they are mated to Montbéliard or Holstein cows. When using genomic prediction, the Montbéliard x Charolais and Holstein x Charolais populations could be combined into a single reference population to increase size and accuracy of genomic prediction. Results indicate that the higher the genetic correlation is between the two crossbred populations, the higher the gain in accuracy is achieved when combining the two populations into a single reference.
The selection of Charolais sires to produce purebred or crossbred animals is made through distinct breeding programs. An alternative could be to combine selection into one breeding program. Decision for combining or keeping breeding programs separate is determined by the correlation between the breeding objectives, the selection intensity, the difference in level of genetic merit, the accuracy of selection, and the recent implementation of genomic evaluation. Considering all parameters and based on estimations for selection on birth weight, I recommend combining both breeding programs because this will lead to higher genetic gain, and might simplify operating organization and reduce associated costs.
Utilization of complete chloroplast genomes for phylogenetic studies
Ramlee, Shairul Izan Binti - \ 2016
Wageningen University. Promotor(en): Richard Visser, co-promotor(en): Rene Smulders; Theo Borm. - Wageningen : Wageningen University - ISBN 9789462579354 - 186
phylogenetics - genomes - chloroplasts - models - solanum - orchidaceae - phylogenomics - dna sequencing - fylogenetica - genomen - chloroplasten - modellen - solanum - orchidaceae - phylogenomica - dna-sequencing
Chloroplast DNA sequence polymorphisms are a primary source of data in many plant phylogenetic studies. The chloroplast genome is relatively conserved in its evolution making it an ideal molecule to retain phylogenetic signals. The chloroplast genome is also largely, but not completely, free from other evolutionary processes such as gene duplication, concerted evolution, pseudogene formation and genome rearrangements. The conservation of the chloroplast genome sequence allows designing primers targeting regions conserved well beyond species boundaries, and amplification of these targets.
The small size together with their high copy number in leaf cells greatly facilitates chloroplast genome sequencing. In this thesis, chloroplast phylogenomics was conducted using complete chloroplast DNA genomes obtained by a newly developed method of de novo assembly. The method was not only cost-effective but also has the potential to extract a wealth of useful information of thousands of chloroplast genomes from Whole Genome Shotgun (WGS) data. We used k-mer frequency tables to identify and extract the chloroplast reads from the WGS reads and assemble these using a highly integrated and automated custom pipeline. This pipeline includes steps aimed at optimizing assemblies and filling gaps that are left due to coverage variation in the WGS dataset. The pipeline enabled successful de novo assembly across a range of nuclear genome sizes, from Solanum lycopersicon (tomato, 0.9 Gb) to Paphiopedilum heryanum (slipper orchid, 35 Gb).
The pipeline is suitable for studying structural variation in the chloroplast genome, as opposed to the common procedure of read mapping against a reference genome. To support the putative rearrangements, a flexible assembly quality comparison tool was created that combines and visualizes read mapping and alignment results in a two-dimensional plot. We have evaluated the ability of this tool using the de novo assemblies of S. lycopersicon and P. henryanum chloroplasts. The results show that not only we can immediately select the best of two options, but also determine the location of specific artefacts.
In order to explore and evaluate the utility of complete chloroplast phylogenomics, tomato and Paphiopedilum spp were used to conduct phylogenetic inferences based on the complete chloroplast genome. In total 84 tomato chloroplast genomes within the section Lycopersicon were assembled and phylogenetic trees produced. The analyses revealed that next to the chloroplast regions and spacers traditionally used for phylogenetics, additional regions of protein coding and non-coding DNA may be exploited for intraspecific phylogenetic studies. In particular, more than 50% of all phylogenetically relevant information could be included by just using four genes (ycf1, ndhF, ndhA, and ndhH), of which 34% in ycf1 alone. The topology of the phylogenetic tree inferred from ycf1 was the same as that of trees based on all other protein coding genes, although with lower bootstrap values. The phylogenetic analyses based on 32 complete Paphiopedilum spp. chloroplast genomes confirmed the division of the genus into three subgenera Parvisepalum, Brachypetalum and Paphiopedilum. The division of five sections of subgenus Paphiopedilum was also recovered. The de novo assemblies revealed several structural rearrangements including gene loss and inversion. In addition, the chloroplast genome of Paphiopedilum has experienced extreme IR expansion that has included part of or the entire SSC region, resulting in larger IR regions than commonly observed among monocots.
In conclusion, WGS data offer opportunities to generate partial or entire chloroplast genomes for phylogenetic studies. Species discrimination can be achieved already with partial data (subsets of genes), but evolutionarily young lineages may require more informative characters. Therefore, it is expected that many complete chloroplast genomes will be produced in the years to come. While generating these genomes, the urge for de novo assembly of chloroplast genomes rather than mapping against reference genomes is adamant in order to also uncover structural rearrangements in chloroplast genome.
Antibodies and longevity of dairy cattle : genetic analysis
Klerk, B. de - \ 2016
Wageningen University. Promotor(en): Johan van Arendonk, co-promotor(en): Jan van der Poel; Bart Ducro. - Wageningen : Wageningen University - ISBN 9789462577589 - 134
dairy cattle - dairy cows - antibodies - longevity - genetic analysis - breeding value - genomes - genetic improvement - animal genetics - melkvee - melkkoeien - antilichamen - gebruiksduur - genetische analyse - fokwaarde - genomen - genetische verbetering - diergenetica
The dairy sector has a big impact on food production for the growing world population and contributes substantially to the world economy. In order to produce food in a sustainable way, dairy cows need to be able to produce milk without problems and as long as possible. Therefore, breeding programs focuses on improvement of important traits for dairy cows. In order to improve desirable traits and obtain genetic gain there is a constant need for optimization of breeding programs and search for useful parameters to include within breeding programs. Over the last several decades, breeding in dairy cattle mainly focused on production and fertility traits, with less emphasis on health traits. Health problems, however, can cause substantial economic losses to the dairy industry. The economic losses, together with the rising awareness of animal welfare, increased herd size, and less attention for individual animals, have led to an increased need to focus more on health traits. Longevity is strongly related to disease resistance, since a more healthy cow will live a longer productive life (longevity). The identification of biomarkers and the detection of genes controlling health and longevity, would not only greatly enhance the understanding of such traits but also offer the opportunity to improve breeding schemes. The objectives of this thesis therefore were 1) to find an easy measurable disease resistance related biomarker in dairy cows, 2) identify the relation between antibodies and longevity, 3) identify genomic regions that are involved with antibody production/expression. In this thesis antibodies are investigated as parameter for longevity. Antibodies might be a novel parameter that enables selection of cows with an improved ability to stay healthy and to remain productive over a longer period of time. In this thesis antibodies bindiging the naive antigen keyhole limpet hemocyanin (KLH) were assumed to be natural antibodies. Antibodies binding bacteria-derived antigens lipoteichoic acid (LTA), lipopolysaccharide (LPS) and peptidoglycan (PGN) were assumed to be specific antibodies. In chapter 2 it was shown that levels of antibodies are heritable (up to h2 = 0.23). Additionally, antibody levels measured in milk and blood are genetically highly correlated (± 0.80) for the two studied isotypes (IgG and IgM). On the other hand, phenotypically, natural antibodies (from both IgG and IgM isotype) measured in milk cannot be interpreted as the same trait (phenotypic correlation = ± 0.40). In chapter 3 and 4 it was shown that levels of antibodies (both natural-and specific antibodies) showed a negative relation with longevity: first lactation cows with low IgM or IgG levels were found to have a longer productive life. When using estimated breeding values for longevity, only a significant relation was found between natural antibody level (IgM binding KLH) and longevity. Lastly chapter 5 reports on a genome-wide-association study (GWAS), to detect genes contributing to genetic variation in natural antibody level. For natural antibody isotype IgG, genomic regions with a significant association were found on chromosome 21 (BTA). These regions included genes have impact on in isotype class switching (from IgM to IgG). The gained knowledge on relations between antibodies and longevity and the gained insight on genes responsible for natural antibodies level make antibodies potential interesting biomarkers for longevity.
Conservation genetics of the frankincense tree
Bekele, A.A. - \ 2016
Wageningen University. Promotor(en): Frans Bongers, co-promotor(en): Rene Smulders; K. Tesfaye Geletu. - Wageningen : Wageningen University - ISBN 9789462576865 - 158
boswellia - genomes - dna sequencing - tropical forests - genetic diversity - genetic variation - genetics - forest management - plant breeding - boswellia - genomen - dna-sequencing - tropische bossen - genetische diversiteit - genetische variatie - genetica - bosbedrijfsvoering - plantenveredeling
Boswellia papyrifera is an important tree species of the extensive Combretum-Terminalia dry tropical forests and woodlands in Africa. The species produces a frankincense which is internationally traded because of its value as ingredient in cosmetic, detergent, food flavor and perfumes productions, and because of its extensive use as incense during religious and cultural ceremonies in many parts of the world. The forests in which B. papyrifera grows are increasingly overexploited at the expense of the economic benefit and the wealth of ecological services they provide. Populations of B. papyrifera have declined in size and are increasingly fragmented. Regeneration has been blocked for the last 50 years in most areas and adult productive trees are dying. Projections showed a 90% loss of B. papyrifera trees in the coming 50 years and a 50% loss of frankincense production in 15 years time.
This study addressed the conservation genetics of B. papyrifera. Forty six microsatellite (SSR) markers were developed for this species, and these genetic markers were applied to characterize the genetic diversity pattern of 12 B. papyrifera populations in Ethiopia. Next to this, also the generational change in genetic diversity and the within-population genetic structure (FSGS) of two cohort groups (adults and seedlings) were studied in two populations from Western Ethiopia. In these populations seedlings and saplings were found and natural regeneration still takes place, a discovery that is important for the conservation of the species.
Despite the threats the populations are experiencing, ample genetic variation was present in the adult trees of the populations, including the most degraded populations. Low levels of population differentiation and isolation-by-distance patterns were detected. Populations could be grouped into four genetic clusters: the North eastern (NE), Western (W), North western (NW) and Northern (N) part of Ethiopia. The clusters corresponded to environmentally different conditions in terms of temperature, rainfall and soil conditions. We detected a low FSGS and found that individuals are significantly related up to a distance of 60-130 m.
Conservation of the B. papyrifera populations is urgently needed. The regeneration bottlenecks in most existing populations are an urgent prevailing problem that needs to be solved to ensure the continuity of the genetic diversity, species survival and sustainable production of frankincense. Local communities living in and around the forests should be involved in the use and management of the forests. In situ conservation activities will promote gene flow among fragmented populations and scattered remnant trees, so that the existing level of genetic diversity may be preserved. Geographical distance among populations is the main factor to be considered in sampling for ex situ conservation. A minimum of four conservation sites for B. papyrifera is recommended, representing each of the genetic clusters. Based on the findings of FSGS analyses, seed collection for ex situ conservation and plantation programmes should come from trees at least 100 m, but preferably 150 m apart.
Mining microbiota signatures in human intestinal tract metagenomes
Tims, S. - \ 2016
Wageningen University. Promotor(en): Michiel Kleerebezem; Willem de Vos, co-promotor(en): Erwin Zoetendal. - Wageningen : Wageningen University - ISBN 9789462576933 - 264
gastrointestinal microbiota - intestines - genomes - man - hosts - host guest relations - dna microarrays - gastrointestinal diseases - inflammatory bowel diseases - irritable colon - prebiotics - body mass index - oligosaccharides - microbiota van het spijsverteringskanaal - darmen - genomen - mens - gastheren (dieren, mensen, planten) - relaties tussen gastheer en gast - dna microarrays - maagdarmziekten - chronische darmontstekingen - prikkelbaar colon - prebiotica - quetelet index - oligosacchariden
Genetic diversity and evolution in Lactuca L. (Asteraceae) : from phylogeny to molecular breeding
Wei, Z. - \ 2016
Wageningen University. Promotor(en): Eric Schranz. - Wageningen : Wageningen University - ISBN 9789462576148 - 210
lactuca sativa - leafy vegetables - phylogeny - genetic diversity - domestication - molecular breeding - genomes - dna - quantitative trait loci - evolution - lactuca sativa - bladgroenten - fylogenie - genetische diversiteit - domesticatie - moleculaire veredeling - genomen - dna - loci voor kwantitatief kenmerk - evolutie
Cultivated lettuce (Lactuca sativa L.) is an important leafy vegetable worldwide. However, the phylogenetic relationships between domesticated lettuce and its wild relatives are still not clear. In this thesis, I focus on the phylogenetic relationships within Lactuca L., including an analysis of the wild Lactuca species that are endemic to Africa for the first time. The genetic variation of responses to salinity in a recombinant inbred line population, derived from a cross between the lettuce crop (L. sativa ‘Salinas’) and wild species (L. serriola), was investigated and the candidate gene in the identified QTL regions was further studied.
In Chapter 1, I introduce and discuss topics related to genetic diversity and evolution in Lactuca, including an overview of lettuce cultivars and uses, its hypothesized domestication history, the taxonomic position of Lactuca, current status of molecular breeding in lettuce and mechanisms of salinity tolerance in plants, especially the High-affinity K+ Transporter (HKT) gene family.
In Chapter 2, the most extensive molecular phylogenetic analysis of Lactuca was constructed based on two chloroplast genes (ndhF and trnL-F), including endemic African species for the first time. This taxon sampling covers nearly 40% of the total Lactuca species endemic to Africa and 34% of all Lactuca species. DNA sequences from all the subfamilies of Asteraceae in Genbank and those generated from Lactuca herbarium samples were used to elucidate the monophyly of Lactuca and the affiliation of Lactuca within Asteracaeae. Based on the subfamily tree, 33 ndhF sequences from 30 species and 79 trnL-F sequences from 48 species were selected to infer phylogenetic relationships within Lactuca using Randomized Axelerated Maximum Likelihood (RAxML) and Bayesian Inference (BI) analyses. In addition, biogeographical, chromosomal and morphological character states were analysed based on the Bayesian tree topology. The results showed that Lactuca contains two distinct phylogenetic clades - the crop clade and the Pterocypsela clade. Other North American, Asian and widespread species either form smaller clades or mix with the Melanoseris species in an unresolved polytomy. The newly sampled African endemic species probably should be excluded from Lactuca and treated as a new genus.
In Chapter 3, twenty-seven wild Lactuca species and four outgroup species were sequenced using next generation sequencing (NGS) technology. The sampling covers 36% of total Lactuca species and all the important geographical groups in the genus. Thirty chloroplast genomes, including one complete (partial) large single copy region (LSC), one small single copy region (SSC), one inverted repeat (IR) region, and twenty-nine nuclear ribosomal DNA sequences (containing the internal transcribed spacer region ) were successfully assembled and analysed. A methodology paper for which I am co-author, but is not included in this thesis, of the sequencing pipeline was published: ‘Herbarium genomics: plastome sequence assembly from a range of herbarium specimens using an Iterative Organelle Genome Assembly (IOGA) pipeline’. These NGS data helped resolve deeper nodes in the phylogeny within Lactuca and resolved the polytomy from Chapter 2. The results showed that there are at least four main groups within Lactuca: the crop group, the Pterocypsela group, the North American group and the group containing widely-distributed species. I also confirmed that the endemic African species should be removed and treated as a new genus.
In Chapter 4, quantitative trait loci (QTLs) related to salt-induced changes in Root System Architecture (RSA) and ion accumulation were determined using a recombinant inbred line population derived from a cross between cultivated lettuce and wild lettuce. I measured the components of RSA by replicated lettuce seedlings grown on vertical agar plates with different NaCl concentrations in a controlled growth chamber environment. I also quantified the concentration of sodium and potassium in replicates of greenhouse-grown plants watered with 100 mM NaCl. The results identified a total of fourteen QTLs using multi-trait linkage analysis, including three major QTLs associated with general root development (qRC9.1), root growth in salt stress condition (qRS2.1), and ion accumulation (qLS7.2).
In Chapter 5, one of the identified QTL regions (qLS7.2) reported in Chapter 4 was found to contain a homolog of the HKT1 from Arabidopsis thaliana. I did a phylogenetic analysis of Lactuca HKT1-like protein sequences with other published HKT protein sequences and determined transmembrane and pore segments of lettuce HKT1;1 alleles, according to the model proposed for AtHKT1;1. Gene expression pattern and level of LsaHKT1;1 (L. sativa ‘Salinas’) and LseHKT1;1 (L. serriola) in root and shoot were investigated in plants growing hydroponically over a time-course. The measurements of Na+ and K+ contents were sampled at the same time as the samples used for gene expression test. In addition, I examined the 5’ promoter regions of the two genotypes. The results showed low expression levels of both HKT1;1 alleles in Lactuca root and relatively higher expression in shoot, probably due to the negative cis-regulatory elements of HKT1 alleles found in Lactuca promoter regions. Significant allelic differences were found in HKT1;1 expression in early stage (0-24 hours) shoots in and in late stage (2-6 days) roots. shoot HKT1;1 expression/root HKT1;1 expression was generally consistent with the ratios of Na+/K+ balance in the relevant tissues (shoot Na+/K+ divided by root Na+/K+).
In Chapter 6, I summarize and discuss the results from previous chapters briefly. The implications of Chapter 2 and 3 for Lactuca phylogenetics are discussed, including some key characters for the diagnosis of species within Lactuca, the use of herbarium DNA for NGS technology, and perspectives into Lactuca phylogeny. Future perspectives of genome-wide association mapping for lettuce breeding were also discussed. Lastly, I propose to integrate phylogenetic approaches into investigations of allelic differences in lettuce, not just associated with salinity stress but also with other stressed and beneficial characters, both within and between species.
Natural genetic variation in Arabidopsis thaliana photosynthesis
Flood, P.J. - \ 2015
Wageningen University. Promotor(en): Maarten Koornneef, co-promotor(en): Mark Aarts; Jeremy Harbinson. - Wageningen : Wageningen University - ISBN 9789462575004 - 278
arabidopsis thaliana - genetische variatie - fotosynthese - genomen - chlorofyl - fenotypen - arabidopsis thaliana - genetic variation - photosynthesis - genomes - chlorophyll - phenotypes
Oxygenic photosynthesis is the gateway of the sun’s energy into the biosphere, it is where light becomes life. Genetic variation is the fuel of evolution, without it natural selection is powerless and adaptation impossible. In this thesis I have set out to study a relatively unexplored field which sits at the intersection of these two topics, namely natural genetic variation in plant photosynthesis. To begin I reviewed the available literature (Chapter 2), from this it became clear that the main bottleneck restricting progress was the lack of high-throughput phenotyping platforms for photosynthesis. To address this an automated high-throughput chlorophyll fluorescence phenotyping system was developed, which could measure 1440 plants in less than an hour for ΦPSII, a measure of photosynthetic efficiency (Chapter 3). Using this phenotyping platform I screened five populations of Arabidopsis thaliana. Three of these populations resulted from bi-parental crosses and segregated for only two genomes, using these I conducted family mapping (Chapter 4). The final two populations were composed of natural, field collected, accessions and were analysed using a genome wide association approach (Chapter 5). The family mapping approach had greater statistical power due to within population replication and the genome wide association approach had higher mapping resolution due to historical recombination. Both approaches were used to identify genomic regions (loci) which were responsible for some of the variation in photosynthesis observed. The number and average effect of these loci was used to infer the genetic architecture of photosynthesis as a highly complex polygenic trait for which there are many loci of very small effect. In addition to screening these large populations a smaller subset of 18 lines was assayed for natural variation in phosphorylation of photosystem II (PSII) proteins in response to changing light (Chapter 6). This exploratory study indicated that this process shows considerable variation and may be important for adaptation of the photosynthetic apparatus to photosynthetic extremes. The genetic mapping studies just described, focus exclusively on genetic variation in the nuclear genome, whilst this contains the majority of the plants genetic information there is also a store of genetic information in the chloroplast and mitochondria. These genetic repositories contain genes which are essential for photosynthesis and energy metabolism. Any variation in these genes could have a large impact on photosynthesis. To study natural variation in these genomes I developed a new population of reciprocal nuclear-organellar hybrids (cybrids) which could be used to study the effect of genetic variation in organelles whilst controlling for nuclear genetic variation (Chapter 7). Preliminary results indicate that this resource will be of great use in disentangling natural genetic variation in nucleo-organelle interactions. Finally I looked at one chloroplast encoded photosynthetic mutation in more detail (Chapter 8). This mutation had evolved in response to herbicide application and had spread along British railways. When studying this population of resistant plants I found empirical evidence for organelle mediated nuclear genetic hitchhiking. This is a previously undescribed evolutionary phenomenon and is likely to be quite common. In conclusion there is an abundance of genetic variation in photosynthesis which can be used to improve the trait for agriculture and provide insights into novel evolutionary phenomena in the field.
Novel introner-like elements in fungi are involved in parallel gains of spliceosomal introns
Collemare, J. ; Beenen, H.G. ; Crous, P.W. ; Wit, P.J.G.M. de; Burgt, A. van der - \ 2015
PLoS ONE 10 (2015)6. - ISSN 1932-6203 - 12 p.
daphnia populations - maximum-likelihood - evolution - gene - positions - conservation - selection - sequence - genomes
Spliceosomal introns are key components of the eukaryotic gene structure. Although they contributed to the emergence of eukaryotes, their origin remains elusive. In fungi, they might originate from the multiplication of invasive introns named Introner-Like Elements (ILEs). However, so far ILEs have been observed in six fungal species only, including Fulvia fulva and Dothistroma septosporum (Dothideomycetes), arguing against ILE insertion as a general mechanism for intron gain. Here, we identified novel ILEs in eight additional fungal species that are phylogenetically related to F. fulva and D. septosporum using PCR amplification with primers derived from previously identified ILEs. The ILE content appeared unique to each species, suggesting independent multiplication events. Interestingly, we identified four genes each containing two gained ILEs. By analysing intron positions in orthologues of these four genes in Ascomycota, we found that three ILEs had inserted within a 15 bp window that contains regular spliceosomal introns in other fungal species. These three positions are not the result of intron sliding because ILEs are newly gained introns. Furthermore, the alternative hypothesis of an inferred ancestral gain followed by independent losses contradicts the observed degeneration of ILEs. These observations clearly indicate three parallel intron gains in four genes that were randomly identified. Our findings suggest that parallel intron gain is a phenomenon that has been highly underestimated in ILE-containing fungi, and likely in the whole fungal kingdom.
Using natural variation to unravel the dynamic regulation of plant performance in diverse environments
Molenaar, J.A. - \ 2015
Wageningen University. Promotor(en): Harro Bouwmeester; Joost Keurentjes, co-promotor(en): Dick Vreugdenhil. - Wageningen : Wageningen University - ISBN 9789462573444 - 186
planten - genomen - loci voor kwantitatief kenmerk - warmtestress - genetische kartering - groei - droogte - plantengenetica - plantenfysiologie - plants - genomes - quantitative trait loci - heat stress - genetic mapping - growth - drought - plant genetics - plant physiology
All plants are able to respond to changes in their environment by adjusting their morphology and metabolism, but large differences are observed in the effectiveness of these responses in the light of plant fitness. Between and within species large differences are observed in plant responses to drought, heat and other abiotic stresses. This natural variation is partly due to variation in the genetic composition of individuals. Within-species variation can be used to identify and study genes involved in the genetic regulation of plant performance.
Growth of the world population will, in the coming years, lead to an increased demand for food, feed and other natural products. In addition, extreme weather conditions with, amongst others, more and prolonged periods of drought and heat are expected to occur due to climate change. Therefore breeders are challenged to produce stress tolerant cultivars with improved yield under sub-optimal conditions. Knowledge about the mechanisms and genes that underlie tolerance to drought, heat and other abiotic stresses will ease this challenge.
The aim of this thesis was to identify and study the role of genes that are underlying natural variation in plant performance under drought, salt and heat stress. To reach this goal a genome wide association (GWA) mapping approach was taken in the model species Arabidopsis thaliana. A population of 350 natural accessions of Arabidopsis, genotyped with 215k SNPs, was grown under control and several stress conditions and plant performance was evaluated by phenotyping one or several plant traits per environment. Genes located in the genomic regions that were significantly associated with plant performance, were studied in more detail.
Plant performance was first evaluated upon osmotic stress (Chapter 2). This treatment resulted not only in a reduced plant size, but also caused the colour of the rosette leaves to change from green to purple-red due to anthocyanin accumulation. The latter was visually quantified and subsequent GWA mapping revealed that a large part of the variation in anthocyanin accumulation could be explained by a small genomic region on chromosome 1. The analysis of re-sequence data allowed us to associate the second most frequent allele of MYB90 with higher anthocyanin accumulation and to identify the causal SNP. Interestingly MYB75, a close relative of MYB90, was not identified by GWA mapping, although causal sequence variation of this gene for anthocyanin accumulation was identified in the Cvi x Ler and Ler x Eri-1 RIL populations. Re-sequence data revealed that one allele of MYB75 was dominating the population and that the MYB75 alleles of Cvi and Ler were both rare, explaining the lack of association at this locus in GWA mapping. For MYB90, two alleles were present in a substantial part of the population, suggesting balancing selection between them.
Next, the natural population was exposed to short-term heat stress during flowering (Chapter 3). This short-term stress has a large impact on seed set, while it hardly affects the vegetative tissues. Natural variation for tolerance against the effect of heat on seed set was evaluated by measuring the length of all siliques along the inflorescence in both heat-treated and control plants. Because the flower that opened during the treatment was tagged, we could analyse the heat response for several developmental stages separately. GWA mapping revealed that the heat response before and after anthesis involved different genes. For the heat response before anthesis strong evidence was gained that FLC, a flowering time regulator and QUL2, a gene suggested to play a role in vascular tissue development, were causal for two strong associations.
Furthermore, the impact of moderate drought on plant performance was evaluated in the plant phenotyping platform PHENOPSIS. Homogeneous drought was assured by tight regulation of climate cell conditions and the robotic weighing and watering of the pots twice a day. Because plant growth is a dynamic trait it was monitored over time by top-view imaging under both moderate drought and control conditions (Chapter 4 and 5). To characterise growth it was modelled with an exponential function. GWA mapping of temporal growth data resulted in the detection of time-dependent QTLs whereas mapping of model parameters resulted in another set of QTLs related to the entire growth period. Most of these QTLs would not have been identified if plant size had only been determined on a single day. For the QTLs detected under control conditions eight candidate genes with a growth-related mutant or overexpression phenotype were identified (Chapter 4). Genes in the support window of the drought-QTLs were prioritized based on previously reported gene expression data (Chapter 5). Additional validation experiments are needed to confirm causality of the candidate genes.
Next, to search for genes that determine plant size across many environments, biomass accumulation in the natural population was determined in 25 different environments (Chapter 6). Joint analysis of these data by multi-environment GWA mapping resulted in the detection of 106 strongly associated SNPs with significant effects in 7 to 16 environments. Several genes involved in starch metabolism, leaf size control and flowering time determination were located in close proximity of the associated SNPs. Two genes, RPM1 and ACD6, were located in close proximity of SNPs with significant GxE effects. For both genes, alleles have been identified that increase resistance to bacterial infection, but that reduce biomass accumulation. The sign of the allelic effect is therefore dependent on the environmental conditions. Whole genome predictions revealed that most of the GxE interactions observed at the phenotypic level were not the consequence of strong associations with strong QxE effects, but of moderate and weak associations with weak QxE effects.
Finally, in Chapter 7 I discuss the usefulness of GWA mapping in the identification of genes underlying natural variation in plant performance under drought, heat stress and a number of other environments. Strong associations were observed for both environment-specific as well as common plant performance regulators. Some choices in phenotyping and experimental design were crucial for our success, like evaluation of plant performance over time and simplification of the quantification of the phenotype. It is suggested that follow-up work should focus on the functional characterization of the causal genes, because such analyses would be helpful to identify pathways in which the causal genes are involved and to understand why sequence variation results in changes at the phenotype level. Although translation of the findings to applications in crops is challenging, this thesis contributes to the understanding of the genetic regulation of stress response and therefore will likely contribute to the development of stress tolerant and stable yielding crops.
Breeding program for indigenous chicken in Kenya
Ngeno, K. - \ 2015
Wageningen University. Promotor(en): Johan van Arendonk, co-promotor(en): Liesbeth van der Waaij; A.K. Kahi. - Wageningen : Wageningen University - ISBN 9789462572775 - 154
kippen - pluimvee - inheems vee - dierveredeling - veredelingsprogramma's - genetische diversiteit - ecotypen - genomen - genetische verbetering - kenya - fowls - poultry - native livestock - animal breeding - breeding programmes - genetic diversity - ecotypes - genomes - genetic improvement - kenya
Ngeno, K. (2015). Breeding program for indigenous chicken in Kenya. Analysis of diversity in indigenous chicken populations. PhD thesis, Wageningen University, the Netherlands
The objective of this research was to generate knowledge required for the development of an indigenous chicken (IC) breeding program for enhanced productivity and improved human livelihood in Kenya. The initial step was to review five questions; what, why and how should we conserve IC in an effective and sustainable way, who are the stakeholders and what are their roles in the IC breeding program. The next step of the research focused on detecting distinctive IC ecotypes through morphological and genomic characterization. Indigenous chicken ecotypes were found to be populations with huge variability in the morphological features. Molecular characterization was carried out using microsatellite markers and whole genome re-sequenced data. The studied IC ecotypes are genetically distinct groups. The MHC-linked microsatellite markers divided the eight IC ecotypes studied into three mixed clusters, composing of individuals from the different ecotypes whereas non-MHC markers grouped ICs into two groups. Analysis revealed high genetic variation within the ecotype with highly diverse MHC-linked alleles which are known to be involved in disease resistance. Whole genome re-sequencing revealed genomic variability, regions affected by selection, candidate genes and mutations that can explain partially the phenotypic divergence between IC and commercial layers. Unlike commercial chickens, IC preserved a high genomic variability that may be important in addressing present and future challenges associated with environmental adaptation and farmers’ breeding goals. Lastly, this study showed that there is an opportunity to improve IC through selection within the population. Genetic improvement utilizing within IC selection requires setting up a breeding program. The study described the systematic and logical steps in designing a breeding program by focusing on farmers’ need, how to improve IC to fit the farming conditions, and management regimes.
The hybrid nature of pig genomes : unraveling the mosaic haplotype structure in wild and commercial Sus scrofa populations
Bosse, M. - \ 2015
Wageningen University. Promotor(en): Martien Groenen, co-promotor(en): Hendrik-Jan Megens; Ole Madsen. - Wageningen : Wageningen University - ISBN 9789462573000 - 253
dieren - varkens - dierveredeling - genomen - hybridisatie - sus scrofa - haplotypen - genomica - populaties - genetische variatie - animals - pigs - animal breeding - genomes - hybridization - sus scrofa - haplotypes - genomics - populations - genetic variation - cum laude
cum laude graduation
Genomics 4.0 : syntenic gene and genome duplication drives diversification of plant secondary metabolism and innate immunity in flowering plants : advanced pattern analytics in duplicate genomes
Hofberger, J.A. - \ 2015
Wageningen University. Promotor(en): Eric Schranz. - Wageningen : Wageningen University - ISBN 9789462573147 - 142
genomica - planten - metabolisme - bloeiende planten - genomen - genen - next generation sequencing - genomics - plants - metabolism - flowering plants - genomes - genes - next generation sequencing
Genomics 4.0 - Syntenic Gene and Genome Duplication Drives Diversification of Plant Secondary Metabolism and Innate Immunity in Flowering Plants
Johannes A. Hofberger1, 2, 3
1 Biosystematics Group, Wageningen University & Research Center, Droevendaalsesteeg 1, 6708 PB Wageningen, The Netherlands (August 2012 – December 2013)
2 Institute for Biodiversity and Ecosystem Dynamics, University of Amsterdam, Science Park 904, 1098 XH Amsterdam, The Netherlands (December 2010 – July 2012)
3 Chinese Academy of Sciences/Max Planck Partner Institute for Computational Biology, 320 Yueyang Road,
Shanghai 200031, PR China (January 2014 – December 2014)TWO-SENTENCE SUMMARY
Large-scale comparative analysis of Big Data from next generation sequencing provides powerful means to exploit the potential of nature in context of plant breeding and biotechnology. In this thesis, we combine various computational methods for genome-wide identification of gene families involved in (a) plant innate immunity and (a) biosynthesis of defense-related plant secondary metabolites across 21 species, assess dynamics that affected evolution of underlying traits during 250 Million Years of flowering plant radiation and provide data on more than 4500 loci that can underpin crop improvement for future food and live quality.
As sessile organisms, plants are permanently exposed to a plethora of potentially harmful microbes and other pests. The surprising resilience to infections observed in successful lineages is due to a complex defense network fighting off invading pathogens. Within this network, a sophisticated plant innate immune system is accompanied by a multitude of specialized biosynthetic pathways that generate more than 200,000 secondary metabolites with ecological, agricultural, energy and medicinal importance. The rapid diversification of associated genes was accompanied by a series of duplication events in virtually all plant species, including local duplication of short sequences as well as multiplication of all chromosomes due to meiotic errors (plant polyploidy). In a comparative genomics approach, we combined several bioinformatics techniques for large-scale identification of multi-domain and multi-gene families that are involved in plant innate immunity or defense-related secondary metabolite pathways across 21 representative flowering plant genomes. We introduced a framework to trace back duplicate gene copies to distinct ancient duplication events, thereby unravelling a differential impact of gene and genome duplication to molecular evolution of target genes. Comparing the genomic context among homologs within and between species in a phylogenomics perspective, we discovered orthologs conserved within genomic regions that remained structurally immobile during flowering plant radiation. In summary, we described a complex interplay of gene and genome duplication that increased genetic versatility of disease resistance and secondary metabolite pathways, thereby expanding the playground for functional diversification and thus plant trait innovation and success. Our findings give fascinating insights to evolution across lineages and can underpin crop improvement for food, fiber and biofuels production
Structural variations in pig genomes
Paudel, Y. - \ 2015
Wageningen University. Promotor(en): Martien Groenen, co-promotor(en): Ole Madsen; Hendrik-Jan Megens. - Wageningen : Wageningen University - ISBN 9789462572171 - 204
varkens - dierveredeling - genomen - genomica - single nucleotide polymorphism - dna-sequencing - fenotypische variatie - chromosoomafwijkingen - evolutie - soortvorming - pigs - animal breeding - genomes - genomics - single nucleotide polymorphism - dna sequencing - phenotypic variation - chromosome aberrations - evolution - speciation
Paudel, Y. (2015). Structural variations in pig genomes. PhD thesis, Wageningen University, the Netherlands
Structural variations are chromosomal rearrangements such as insertions-deletions (INDELs), duplications, inversions, translocations, and copy number variations (CNVs). It has been shown that structural variations are as important as single nucleotide polymorphisms (SNPs) in regards to phenotypic variations. The general aim of this thesis was to use next generation sequencing data to improve our understanding of the evolution of structural variations such as CNVs, and INDELs in pigs. We found that: 1) the frequency of copy number variable regions did not change during pig domestications but rather reflected the demographic history of pigs. 2) CNV of olfactory receptor genes seems to play a role in the on-going speciation of the genus Sus. 3) Variation in copy number of olfactory receptor genes in pigs (Sus scrofa) seems to be shaped by a combination of selection and genetic drift, where the clustering of ORs in the genome is the major source of variation in copy number. 4) Analysis on short INDELs in the pig genome shows that the level of purifying selection of INDELs positively correlates with the functional importance of a genomic region, i.e. strongest purifying selection was observed in gene coding regions. This thesis provides a highly valuable resource for copy number variable regions, INDELs, and SNPs, for future pig genetics and breeding research. Furthermore, this thesis discusses the limitations and improvements of the available tools to conduct structural variation analysis and insights into the future trends in the detection of structural variations.
Binning metagenomic contigs by coverage and composition
Alneberg, J. ; Bjarnason, B.S. ; Bruijn, I. de; Schirmer, M. ; Quick, J. ; Ijaz, U.Z. ; Lahti, L.M. ; Loman, N.J. ; Andersson, A.F. ; Quince, C. - \ 2014
Nature Methods : techniques for life scientists and chemists 11 (2014). - ISSN 1548-7091 - p. 1144 - 1146.
sequences - genomes
Shotgun sequencing enables the reconstruction of genomes from complex microbial communities, but because assembly does not reconstruct entire genomes, it is necessary to bin genome fragments. Here we present CONCOCT, a new algorithm that combines sequence composition and coverage across multiple samples, to automatically cluster contigs into genomes. We demonstrate high recall and precision on artificial as well as real human gut metagenome data sets.
An integrated catalog of reference genes in the human gut microbiome
Li, J. ; Jia, H. ; Cai, X. ; Zhong, H. ; Feng, Q. ; Sunagawa, S. ; Arumugam, M. ; Kultima, J.R. ; Prifti, E. ; Nielsen, T. ; Juncker, A.S. ; Manichanh, C. ; Chen, B. ; Zhang, W. ; Levenez, F. ; Xu, X. ; Xiao, L. ; Liang, S. ; Zhang, D. ; Zhang, Z. ; Chen, W. ; Zhao, H. ; Al-Aama, J.Y. ; Edris, S. ; Yang, H. ; Hansen, H. ; Nielsen, H.B. ; Brunak, S. ; Kristiansen, K. ; Guarner, F. ; Pedersen, O. ; Doré, J. ; Ehrlich, S.D. ; Bork, P. ; Wang, J. ; Vos, W.M. de; Tims, S. ; Zoetendal, E.G. ; Kleerebezem, M. - \ 2014
Nature Biotechnology 32 (2014)8. - ISSN 1087-0156 - p. 834 - 841.
eukaryotic diversity - fecal microbiota - population-size - metagenome - sequences - genomes - tool - alignment - impact - twins
Many analyses of the human gut microbiome depend on a catalog of reference genes. Existing catalogs for the human gut microbiome are based on samples from single cohorts or on reference genomes or protein sequences, which limits coverage of global microbiome diversity. Here we combined 249 newly sequenced samples of the Metagenomics of the Human Intestinal Tract (MetaHit) project with 1,018 previously sequenced samples to create a cohort from three continents that is at least threefold larger than cohorts used for previous gene catalogs. From this we established the integrated gene catalog (IGC) comprising 9,879,896 genes. The catalog includes close-to-complete sets of genes for most gut microbes, which are also of considerably higher quality than in previous catalogs. Analyses of a group of samples from Chinese and Danish individuals using the catalog revealed country-specific gut microbial signatures. This expanded catalog should facilitate quantitative characterization of metagenomic, metatranscriptomic and metaproteomic data from the gut microbiome to understand its variation across populations in human health and disease.
Birth, death, and diversification of mobile promoters in prokaryotes
Passel, M.W.J. van; Nijveen, H. ; Wahl, L.M. - \ 2014
Genetics 197 (2014)1. - ISSN 0016-6731 - p. 291 - 299.
branching-process model - transposable elements - population-genetics - sequence evolution - escherichia-coli - dna - genomes - innovation - conversion - selection
A previous study of prokaryotic genomes identified large reservoirs of putative mobile promoters (PMPs), that is, homologous promoter sequences associated with nonhomologous coding sequences. Here we extend this data set to identify the full complement of mobile promoters in sequenced prokaryotic genomes. The expanded search identifies nearly 40,000 PMP sequences, 90% of which occur in noncoding regions of the genome. To gain further insight from this data set, we develop a birth–death–diversification model for mobile genetic elements subject to sequence diversification; applying the model to PMPs we are able to quantify the relative importance of duplication, loss, horizontal gene transfer (HGT), and diversification to the maintenance of the PMP reservoir. The model predicts low rates of HGT relative to the duplication and loss of PMP copies, rapid dynamics of PMP families, and a pool of PMPs that exist as a single copy in a genome at any given time, despite their mobility. We report evidence of these “singletons” at high frequencies in prokaryotic genomes. We also demonstrate that including selection, either for or against PMPs, was not necessary to describe the observed data.