Staff Publications

Staff Publications

  • external user (warningwarning)
  • Log in as
  • language uk
  • About

    'Staff publications' is the digital repository of Wageningen University & Research

    'Staff publications' contains references to publications authored by Wageningen University staff from 1976 onward.

    Publications authored by the staff of the Research Institutes are available from 1995 onwards.

    Full text documents are added when available. The database is updated daily and currently holds about 240,000 items, of which 72,000 in open access.

    We have a manual that explains all the features 

Record number 532125
Title Experimental results of "Managing variant calling datasets the big data way"
Author(s) Boufea, Katerina; Athanasiadis, Ioannis
DOI http://dx.doi.org/10.5281/zenodo.582145
Department(s) Information Technology
WASS
Publication type Dataset
Publication year 2017
Abstract Tomatula was demonstrated for retrieving the allele frequencies for a given region in the data from Aflitos et al (2014). We developed scripts to retrieve allele frequencies, either from the VCF file storage or Apache Parquet. We executed a series of experiments, querying for a region of 2000 bases in the file of chromosome 6, that corresponds to the approximate length of a gene. We compared both storage formats (VCF files and Parquet), two input sizes (104 and 1144 individuals), different cluster sizes varying between 2 and 150 executor nodes, and HDFS replication factor was set to 3, 5, 7, and 9, in order to examine four main factors that can affect the performance of a Big Data cluster: (a) the storage format, (b) the size of the input files, (c) the number of computing nodes of the cluster, and (d) the replication factor of HDFS. The block size of the HDFS was kept at the default value of 128MB. All experiments were executed five times and the detailed results are provided here, along with a script that produces the corresponding figures.
Comments
There are no comments yet. You can post the first one!
Post a comment
 
Please log in to use this service. Login as Wageningen University & Research user or guest user in upper right hand corner of this page.