Staff Publications

Staff Publications

  • external user (warningwarning)
  • Log in as
  • language uk
  • About

    'Staff publications' is the digital repository of Wageningen University & Research

    'Staff publications' contains references to publications authored by Wageningen University staff from 1976 onward.

    Publications authored by the staff of the Research Institutes are available from 1995 onwards.

    Full text documents are added when available. The database is updated daily and currently holds about 240,000 items, of which 72,000 in open access.

    We have a manual that explains all the features 

Records 1 - 7 / 7

  • help
  • print

    Print search results

  • export

    Export search results

  • alert
    We will mail you new results for this query: metisnummer==1044107
Check title to add to marked list
Finished Genome of the Fungal Wheat Pathogen Mycosphaerella graminicola Reveals Dispensome Structure, Chromosome Plasticity, and Stealth Pathogenesis
Goodwin, S.B. ; M'Barek, S. Ben; Dhillon, B. ; Wittenberg, A.H.J. ; Crane, C.F. ; Hane, J.K. ; Foster, A.J. ; Lee, T.A.J. van der; Grimwood, J. ; Aerts, A. ; Antoniw, J. ; Bailey, A. ; Bluhm, B. ; Bowler, J. ; Bristow, J. ; Burgt, A. van der; Canto-Canché, B. ; Churchill, A.C.L. ; Conde-Ferràez, L. ; Cools, H.J. ; Coutinho, P.M. ; Csukai, M. ; Dehal, P. ; Wit, P.J.G.M. de; Donzelli, B. ; Geest, H.C. van de; Ham, R.C.H.J. van; Hammond-Kosack, K.E. ; Henrissat, B. ; Kilian, A. ; Kobayashi, A.K. ; Koopmann, E. ; Kourmpetis, Y. ; Kuzniar, A. ; Lindquist, E. ; Lombard, V. ; Maliepaard, C.A. ; Martins, N. ; Mehrabi, A. ; Nap, J.P.H. ; Ponomarenko, A. ; Rudd, J.J. ; Salamov, A. ; Schmutz, J. ; Schouten, H.J. ; Shapiro, H. ; Stergiopoulos, I. ; Torriani, S.F.F. ; Tu, H. ; Vries, R.P. de; Waalwijk, C. ; Ware, S.B. ; Wiebenga, A. ; Zwiers, L.H. ; Oliver, R.P. ; Grigoriev, I.V. ; Kema, G.H.J. - \ 2011
Plos Genetics 7 (2011)6. - ISSN 1553-7404 - 17 p.
magnaporthe-grisea - b-chromosomes - gene - host - organization - annotation - resistance - neurospora - expression - symbiosis
The plant-pathogenic fungus Mycosphaerella graminicola (asexual stage: Septoria tritici) causes septoria tritici blotch, a disease that greatly reduces the yield and quality of wheat. This disease is economically important in most wheat-growing areas worldwide and threatens global food production. Control of the disease has been hampered by a limited understanding of the genetic and biochemical bases of pathogenicity, including mechanisms of infection and of resistance in the host. Unlike most other plant pathogens, M. graminicola has a long latent period during which it evades host defenses. Although this type of stealth pathogenicity occurs commonly in Mycosphaerella and other Dothideomycetes, the largest class of plant-pathogenic fungi, its genetic basis is not known. To address this problem, the genome of M. graminicola was sequenced completely. The finished genome contains 21 chromosomes, eight of which could be lost with no visible effect on the fungus and thus are dispensable. This eight-chromosome dispensome is dynamic in field and progeny isolates, is different from the core genome in gene and repeat content, and appears to have originated by ancient horizontal transfer from an unknown donor. Synteny plots of the M. graminicola chromosomes versus those of the only other sequenced Dothideomycete, Stagonospora nodorum, revealed conservation of gene content but not order or orientation, suggesting a high rate of intra-chromosomal rearrangement in one or both species. This observed “mesosynteny” is very different from synteny seen between other organisms. A surprising feature of the M. graminicola genome compared to other sequenced plant pathogens was that it contained very few genes for enzymes that break down plant cell walls, which was more similar to endophytes than to pathogens. The stealth pathogenesis of M. graminicola probably involves degradation of proteins rather than carbohydrates to evade host defenses during the biotrophic stage of infection and may have evolved from endophytic ancestors. Author Summary The plant-pathogenic fungus Mycosphaerella graminicola causes septoria tritici blotch, one of the most economically important diseases of wheat worldwide and a potential threat to global food production. Unlike most other plant pathogens, M. graminicola has a long latent period during which it seems able to evade host defenses, and its genome appears to be unstable with many chromosomes that can change size or be lost during sexual reproduction. To understand its unusual mechanism of pathogenicity and high genomic plasticity, the genome of M. graminicola was sequenced more completely than that of any other filamentous fungus. The finished sequence contains 21 chromosomes, eight of which were different from those in the core genome and appear to have originated by ancient horizontal transfer from an unknown donor. The dispensable chromosomes collectively comprise the dispensome and showed extreme plasticity during sexual reproduction. A surprising feature of the M. graminicola genome was a low number of genes for enzymes that break down plant cell walls; this may represent an evolutionary response to evade detection by plant defense mechanisms. The stealth pathogenicity of M. graminicola may involve degradation of proteins rather than carbohydrates and could have evolved from an endophytic ancestor.
Multi-netclust: an efficient tool for finding connected clusters in multi-parametric networks
Kuzniar, A. ; Dhir, S. ; Nijveen, H. ; Pongor, S. ; Leunissen, J.A.M. - \ 2010
Bioinformatics 26 (2010)19. - ISSN 1367-4803 - p. 2482 - 2483.
protein - algorithm
Multi-netclust is a simple tool that allows users to extract connected clusters of data represented by different networks given in the form of matrices. The tool uses user-defined threshold values to combine the matrices, and uses a straightforward, memory-efficient graph algorithm to find clusters that are connected in all or in either of the networks. The tool is written in C/C++ and is available either as a form-based or as a command-line-based program running on Linux platforms. The algorithm is fast, processing a network of > 106 nodes and 108 edges takes only a few minutes on an ordinary computer.
Evidence for RNA recombination between distinct isolates of Pepino mosaic virus.
Hasiów-Jaroszewska, B. ; Kuzniar, A. ; Peters, S.A. ; Leunissen, J.A.M. ; Pospieszny, H. - \ 2010
Acta Biochimica Polonica 57 (2010)3. - ISSN 0001-527X - p. 385 - 388.
plant-virus - tomato - evolution
Genetic recombination plays an important role in the evolution of virus genomes. In this study we analyzed publicly available genomic sequences of Pepino mosaic virus (PepMV) for recombination events using several bioinformatics tools. The genome-wide analyses not only confirm the presence of previously found recombination events in PepMV but also provide the first evidence for double recombinant origin of the US2 isolate.
ProGMap: an integrated annotation resource for protein orthology
Kuzniar, A. ; Lin, K. ; He, Y. ; Nijveen, H. ; Pongor, S. ; Leunissen, J.A.M. - \ 2009
Nucleic acids research 37 (2009). - ISSN 0305-1048 - p. W428 - W434.
database - gene - information - families - genomes - mbl2 - tool
Current protein sequence databases employ different classification schemes that often provide conflicting annotations, especially for poorly characterized proteins. ProGMap (Protein Group Mappings, http://www.bioinformatics.nl/progmap) is a web-tool designed to help researchers and database annotators to assess the coherence of protein groups defined in various databases and thereby facilitate the annotation of newly sequenced proteins. ProGMap is based on a non-redundant dataset of over 6.6 million protein sequences which is mapped to 240 000 protein group descriptions collected from UniProt, RefSeq, Ensembl, COG, KOG, OrthoMCL-DB, HomoloGene, TRIBES and PIRSF. ProGMap combines the underlying classification schemes via a network of links constructed by a fast and fully automated mapping approach originally developed for document classification. The web interface enables queries to be made using sequence identifiers, gene symbols, protein functions or amino acid and nucleotide sequences. For the latter query type BLAST similarity search and QuickMatch identity search services have been incorporated, for finding sequences similar (or identical) to a query sequence. ProGMap is meant to help users of high throughput methodologies who deal with partially annotated genomic data
Graph-based methods for large-scale protein classification and orthology inference
Kuzniar, A. - \ 2009
Wageningen University. Promotor(en): Jack Leunissen, co-promotor(en): Roeland van Ham; S. Pongor. - [S.l. : S.n. - ISBN 9789085855019 - 139
bio-informatica - eiwitten - classificatie - algoritmen - grafieken - evolutie - bioinformatics - proteins - classification - algorithms - graphs - evolution
The quest for understanding how proteins evolve and function has been a prominent and costly human endeavor. With advances in genomics and use of bioinformatics tools, the diversity of proteins in present day genomes can now be studied more efficiently than ever before. This thesis describes computational methods suitable for large-scale protein classification of many proteomes of diverse species. Specifically, we focus on methods that combine unsupervised learning (clustering) techniques with the knowledge of molecular phylogenetics, particularly that of orthology. In chapter 1 we introduce the biological context of protein structure, function and evolution, review the state-of-the-art sequence-based protein classification methods, and then describe methods used to validate the predictions. Finally, we present the outline and objectives of this thesis. Evolutionary (phylogenetic) concepts are instrumental in studying subjects as diverse as the diversity of genomes, cellular networks, protein structures and functions, and functional genome annotation. In particular, the detection of orthologous proteins (genes) across genomes provides reliable means to infer biological functions and processes from one organism to another. Chapter 2 evaluates the available computational tools, such as algorithms and databases, used to infer orthologous relationships between genes from fully sequenced genomes. We discuss the main caveats of large-scale orthology detection in general as well as the merits and pitfalls of each method in particular. We argue that establishing true orthologous relationships requires a phylogenetic approach which combines both trees and graphs (networks), reliable species phylogeny, genomic data for more than two species, and an insight into the processes of molecular evolution. Also proposed is a set of guidelines to aid researchers in selecting the correct tool. Moreover, this review motivates further research in developing reliable and scalable methods for functional and phylogenetic classification of large protein collections. Chapter 3 proposes a framework in which various protein knowledge-bases are combined into unique network of mappings (links), and hence allows comparisons to be made between expert curated and fully-automated protein classifications from a single entry point. We developed an integrated annotation
resource for protein orthology, ProGMap (Protein Group Mappings, http://www.bioinformatics.nl/progmap), to help researchers and database annotators who often need to assess the coherence of proposed annotations and/or group assignments, as well as users of high throughput methodologies (e.g., microarrays or proteomics) who deal with partially annotated genomic data. ProGMap is based on a non-redundant dataset of over 6.6 million protein sequences which is mapped to 240,000 protein group descriptions collected from UniProt, RefSeq, Ensembl, COG, KOG, OrthoMCL-DB, HomoloGene, TRIBES and PIRSF using a fast and fully automated sequence-based mapping approach. The ProGMap database is equipped with a web interface that enables queries to be made using synonymous sequence identifiers, gene symbols, protein functions, and amino acid or nucleotide sequences. It incorporates also services, namely BLAST similarity search and QuickMatch identity search, for finding sequences similar (or identical) to a query sequence, and tools for presenting the results in graphic form. Graphs (networks) have gained an increasing attention in contemporary biology because they have enabled complex biological systems and processes to be modeled and better understood. For example, protein similarity networks constructed of all-versus-all sequence comparisons are frequently used to delineate similarity groups, such as protein families or orthologous groups in comparative genomics studies. Chapter 4.1 presents a benchmark study of freely available graph software used for this purpose. Specifically, the computational complexity of the programs is investigated using both simulated and biological networks. We show that most available software is not suitable for large networks, such as those encountered in large-scale proteome analyzes, because of the high demands on computational resources. To address this, we developed a fast and memory-efficient graph software, netclust (http://www.bioinformatics.nl/netclust/), which can scale to large protein networks, such as those constructed of millions of proteins and sequence similarities, on a standard computer. An extended version of this program called Multi-netclust is presented in chapter 4.2. This tool that can find connected clusters of data presented by different network data sets. It uses user-defined threshold values to combine the data sets in such a way that clusters connected in all or in either of the networks can be retrieved efficiently. Automated protein sequence clustering is an important task in genome annotation projects and phylogenomic studies. During the past years, several protein clustering programs have been developed for delineating protein families or orthologous groups from large sequence collections. However, most of these programs have not been benchmarked systematically, in particular with respect to the trade-off between computational complexity and biological soundness. In chapter 5 we evaluate three best known algorithms on different protein similarity networks and validation (or 'gold' standard) data sets to find out which one can scale to hundreds of proteomes and still delineate high quality similarity groups at the minimum computational cost. For this, a reliable partition-based approach was used to assess the biological soundness of predicted groups using known protein functions, manually curated protein/domain families and orthologous groups available in expert-curated databases. Our benchmark results support the view that a simple and computationally cheap method such as netclust can perform similar to and in cases even better than more sophisticated, yet much more costly methods. Moreover, we introduce an efficient graph-based method that can delineate protein orthologs of hundreds of proteomes into hierarchical similarity groups de novo. The validity of this method is demonstrated on data obtained from 347 prokaryotic proteomes. The resulting hierarchical protein classification is not only in agreement with manually curated classifications but also provides an enriched framework in which the functional and evolutionary relationships between proteins can be studied at various levels of specificity. Finally, in chapter 6 we summarize the main findings and discuss the merits and shortcomings of the methods developed herein. We also propose directions for future research. The ever increasing flood of new sequence data makes it clear that we need improved tools to be able to handle and extract relevant (orthological) information from these protein data. This thesis summarizes these needs and how they can be addressed by the available tools, or be improved by the new tools that were developed in the course of this research.
The quest for orthologs: finding the corresponding gene across genomes
Kuzniar, A. ; Ham, R.C.H.J. van; Pongor, S. ; Leunissen, J.A.M. - \ 2008
Trends in Genetics 24 (2008)11. - ISSN 0168-9525 - p. 539 - 551.
phylogenetic trees - network propagation - eukaryotic genomes - protein families - database - evolutionary - classification - life - phylogenomics - inference
Orthology is a key evolutionary concept in many areas of genomic research. It provides a framework for subjects as diverse as the evolution of genomes, gene functions, cellular networks and functional genome annotation. Although orthologous proteins usually perform equivalent functions in different species, establishing true orthologous relationships requires a phylogenetic approach, which combines both trees and graphs (networks) using reliable species phylogeny and available genomic data from more than two species, and an insight into the processes of molecular evolution. Here, we evaluate the available bioinformatics tools and provide a set of guidelines to aid researchers in choosing the most appropriate tool for any situation.
Benchmarking protein classification algorithms via supervised cross-validation
Kertész-Farkas, A. ; Dhir, S. ; Sonego, P. ; Pacurar, M. ; Netoteia, S. ; Nijveen, H. ; Kuzniar, A. ; Leunissen, J.A.M. ; Kocsor, A. ; Pongor, S. - \ 2008
Journal of Biochemical and Biophysical Methods 70 (2008)6. - ISSN 0165-022X - p. 1215 - 1223.
sequence classification - homology detection - database - family - information - search
Development and testing of protein classification algorithms are hampered by the fact that the protein universe is characterized by groups vastly different in the number of members, in average protein size, similarity within group, etc. Datasets based on traditional cross-validation (k-fold, leave-one-out, etc.) may not give reliable estimates on how an algorithm will generalize to novel, distantly related subtypes of the known protein classes. Supervised cross-validation, i.e., selection of test and train sets according to the known subtypes within a database has been successfully used earlier in conjunction with the SCOP database. Our goal was to extend this principle to other databases and to design standardized benchmark datasets for protein classification. Hierarchical classification trees of protein categories provide a simple and general framework for designing supervised cross-validation strategies for protein classification. Benchmark datasets can be designed at various levels of the concept hierarchy using a simple graph-theoretic distance. A combination of supervised and random sampling was selected to construct reduced size model datasets, suitable for algorithm comparison. Over 3000 new classification tasks were added to our recently established protein classification benchmark collection that currently includes protein sequence (including protein domains and entire proteins), protein structure and reading frame DNA sequence data. We carried out an extensive evaluation based on various machine-learning algorithms such as nearest neighbor, support vector machines, artificial neural networks, random forests and logistic regression, used in conjunction with comparison algorithms, BLAST, Smith-Waterman, Needleman-Wunsch, as well as 3D comparison methods DALI and PRIDE. The resulting datasets provide lower, and in our opinion more realistic estimates of the classifier performance than do random cross-validation schemes. A combination of supervised and random sampling was used to construct model datasets, suitable for algorithm comparison.
Check title to add to marked list

Show 20 50 100 records per page

 
Please log in to use this service. Login as Wageningen University & Research user or guest user in upper right hand corner of this page.