Record number | 507416 |
---|---|
Title | PanTools: representation, storage and exploration of pan-genomic data |
Author(s) | Sheikhizadeh Anari, S.; Schranz, M.E.; Akdel, Mehmet; Ridder, D. de; Smit, S. |
Source | Bioinformatics 32 (2016)17. - ISSN 1367-4803 - p. i487 - i493. |
Event | ECCB 2016, The Hague, 2016-09-03/2016-09-07 |
DOI | http://dx.doi.org/10.1093/bioinformatics/btw455 |
Department(s) |
Bioinformatics Biosystematics EPS |
Publication type | Refereed Article in a scientific journal |
Publication year | 2016 |
Abstract | Motivation: Next-generation sequencing technology is generating a wealth of highly similar genome sequences for many species, paving the way for a transition from single-genome to pangenome analyses. Accordingly, genomics research is going to switch from reference-centric to pan-genomic approaches. We define the pan-genome as a comprehensive representation of multiple annotated genomes, facilitating analyses on the similarity and divergence of the constituent genomes at the nucleotide, gene and genome structure level. Current pan-genomic approaches do not thoroughly address scalability, functionality and usability. Results: We introduce a generalized De Bruijn graph as a pan-genome representation, as well as an online algorithm to construct it. This representation is stored in a Neo4j graph database, which makes our approach scalable to large eukaryotic genomes. Besides the construction algorithm, our software package, called PanTools, currently provides functionality for annotating pan-genomes, adding sequences, grouping genes, retrieving gene sequences or genomic regions, reconstructing genomes and comparing and querying pan-genomes. We demonstrate the performance of the tool using datasets of 62 E. coli genomes, 93 yeast genomes and 19 Arabidopsis thaliana genomes. Availability and Implementation: The Java implementation of PanTools is publicly available at http://www.bif.wur.nl. Contact: sandra.smit@wur.nl |
Comments |