SV-AUTOPILOT : Optimized, automated construction of structural variation discovery and benchmarking pipelines
Leung, Wai Yi ; Marschall, Tobias ; Paudel, Yogesh ; Falquet, Laurent ; Mei, Hailiang ; Schönhuth, Alexander ; Maoz Moss, Tiffanie Yael - \ 2015
BMC Genomics 16 (2015)1. - ISSN 1471-2164
Benchmarking - Meta-tool - Next-Generation Sequencing Analysis - Non-human genome - Standardized pipeline - Structural Variation - SV prediction - SV tool - SV tool development
Background: Many tools exist to predict structural variants (SVs), utilizing a variety of algorithms. However, they have largely been developed and tested on human germline or somatic (e.g. cancer) variation. It seems appropriate to exploit this wealth of technology available for humans also for other species. Objectives of this work included: a) Creating an automated, standardized pipeline for SV prediction. Results: The SV-AUTOPILOT meta-tool platform is an automated pipeline for standardization of SV prediction and SV tool development in paired-end next-generation sequencing (NGS) analysis. SV-AUTOPILOT comes in the form of a virtual machine, which includes all datasets, tools and algorithms presented here. The virtual machine easily allows one to add, replace and update genomes, SV callers and post-processing routines and therefore provides an easy, out-of-the-box environment for complex SV discovery tasks. SV-AUTOPILOT was used to make a direct comparison between 7 popular SV tools on the Arabidopsis thaliana genome using the Landsberg (Ler) ecotype as a standardized dataset. Recall and precision measurements suggest that Pindel and Clever were the most adaptable to this dataset across all size ranges while Delly performed well for SVs larger than 250 nucleotides. A novel, statistically-sound merging process, which can control the false discovery rate, reduced the false positive rate on the Arabidopsis benchmark dataset used here by >60%. Conclusion: SV-AUTOPILOT provides a meta-tool platform for future SV tool development and the benchmarking of tools on other genomes using a standardized pipeline. It optimizes detection of SVs in non-human genomes using statistically robust merging. The benchmarking in this study has demonstrated the power of 7 different SV tools for analyzing different size classes and types of structural variants. The optional merge feature enriches the call set and reduces false positives providing added benefit to researchers planning to validate SVs. SV-AUTOPILOT is a powerful, new meta-tool for biologists as well as SV tool developers.