Staff Publications

Staff Publications

  • external user (warningwarning)
  • Log in as
  • language uk
  • About

    'Staff publications' is the digital repository of Wageningen University & Research

    'Staff publications' contains references to publications authored by Wageningen University staff from 1976 onward.

    Publications authored by the staff of the Research Institutes are available from 1995 onwards.

    Full text documents are added when available. The database is updated daily and currently holds about 240,000 items, of which 72,000 in open access.

    We have a manual that explains all the features 

Current refinement(s):

Records 1 - 20 / 39

  • help
  • print

    Print search results

  • export

    Export search results

  • alert
    We will mail you new results for this query: keywords==algorithms
Check title to add to marked list
Fourier transform assisted deconvolution of skewed peaks in complex multi-dimensional chromatograms
Hanke, A.T. ; Verhaert, P.D.E.M. ; Wielen, L.A.M. van der; Eppink, M.H.M. ; Sandt, E.J.A.X. ; Ottens, M. - \ 2015
Journal of Chromatography. A, Including electrophoresis and other separation methods 1394 (2015). - ISSN 0021-9673 - p. 54 - 61.
multivariate curve resolution - purification process-development - ion-exchange chromatography - multicomponent chromatograms - liquid-chromatography - automatic program - parameters - optimization - algorithms - equations
Lower order peak moments of individual peaks in heavily fused peak clusters can be determined by fitting peak models to the experimental data. The success of such an approach depends on two main aspects: the generation of meaningful initial estimates on the number and position of the peaks, and the choice of a suitable peak model. For the detection of meaningful peaks in multi-dimensional chromatograms, a fast data scanning algorithm was combined with prior resolution enhancement through the reduction of column and system broadening effects with the help of two-dimensional fast Fourier transforms. To capture the shape of skewed peaks in multi-dimensional chromatograms a formalism for the accurate calculation of exponentially modified Gaussian peaks, one of the most popular models for skewed peaks, was extended for direct fitting of two-dimensional data. The method is demonstrated to successfully identify and deconvolute peaks hidden in strongly fused peak clusters. Incorporation of automatic analysis and reporting of the statistics of the fitted peak parameters and calculated properties allows to easily identify in which regions of the chromatograms additional resolution is required for robust quantification.
Identification and assembly of genomes and genetic elements in complex metagenomic samples without using reference genomes
Nielsen, H.B. ; Almeida, M. ; Sierakowska Juncker, A. ; Rasmussen, S. ; Li, J. ; Sunagawa, S. ; Plichta, D.R. ; Gautier, L. ; Pedersen, A.G. ; Chatelier, E. Le; Pelletier, E. ; Bonde, I. ; Nielsen, T. ; Manichanh, C. ; Arumugam, M. ; Batto, J.M. ; Quintanilha dos Santos, M.B. ; Blom, N. ; Borruel, N. ; Burgdorf, K.S. ; Boumezbeur, F. ; Casellas, F. ; Doré, J. ; Dworzynski, P. ; Guarner, F. ; Hansen, T. ; Hildebrand, F. ; Kaas, R.S. ; Kennedy, S. ; Kristiansen, K. ; Kultima, J.R. ; Leonard, P. ; Levenez, F. ; Lund, O. ; Moumen, B. ; Paslier, D. Le; Pons, N. ; Pedersen, O. ; Prifti, E. ; Qin, J. ; Raes, J. ; Sørensen, S. ; Tap, J. ; Tims, S. ; Ussery, D.W. ; Yamada, T. ; Jamet, A. ; Mérieux, A. ; Cultrone, A. ; Torrejon, A. ; Quinquis, B. ; Brechot, C. ; Delorme, C. ; M'Rini, C. ; Vos, W.M. de; Maguin, E. ; Varela, E. ; Guedon, E. ; Gwen, F. ; Haimet, F. ; Artiguenave, F. ; Vandemeulebrouck, G. ; Denariaz, G. ; Khaci, G. ; Blottière, H. ; Knol, J. ; Weissenbach, J. ; Hylckama Vlieg, J.E. van; Torben, J. ; Parkhil, J. ; Turner, K. ; Guchte, M. van de; Antolin, M. ; Rescigno, M. ; Kleerebezem, M. ; Derrien, M. ; Galleron, N. ; Sanchez, N. ; Grarup, N. ; Veiga, P. ; Oozeer, R. ; Dervyn, R. ; Layec, S. ; Bruls, T. ; Winogradski, Y. ; Zoetendal, E.G. ; Renault, D. ; Sicheritz-Ponten, ; Bork, P. ; Wang, J. ; Brunak, S. ; Ehrlich, S.D. - \ 2014
Nature Biotechnology 32 (2014). - ISSN 1087-0156 - p. 822 - 828.
short read alignment - sequences - systems - algorithms - microbiota - protein - life - sets - tree - tool
Most current approaches for analyzing metagenomic data rely on comparisons to reference genomes, but the microbial diversity of many environments extends far beyond what is covered by reference databases. De novo segregation of complex metagenomic data into specific biological entities, such as particular bacterial strains or viruses, remains a largely unsolved problem. Here we present a method, based on binning co-abundant genes across a series of metagenomic samples, that enables comprehensive discovery of new microbial organisms, viruses and co-inherited genetic entities and aids assembly of microbial genomes without the need for reference sequences. We demonstrate the method on data from 396 human gut microbiome samples and identify 7,381 co-abundance gene groups (CAGs), including 741 metagenomic species (MGS). We use these to assemble 238 high-quality microbial genomes and identify affiliations between MGS and hundreds of viruses or genetic entities. Our method provides the means for comprehensive profiling of the diversity within complex metagenomic samples.
The impact of climate and price risks on agricultural land use and crop management decisions
Lehmann, N. ; Finger, R. - \ 2013
Land Use Policy 35 (2013). - ISSN 0264-8377 - p. 119 - 130.
productivity - farmers - systems - policy - corn - profitability - uncertainty - algorithms - diversity - scenarios
This article aims to investigate the impacts of climate change and of lower and more volatile crop price levels as currently observed in the European Union (EU) on optimal management decisions, average income and income risks in crop production in Western Switzerland. To this end, a bioeconomic whole-farm model has been developed that non-parametrically combines the crop growth model CropSyst with an economic decision model using a genetic algorithm. The analysis focuses on the farm level, which enables us to integrate a wide set of potential adaptation responses, comprising changes in agricultural land use as well as crop-specific fertilization and irrigation strategies. Furthermore, the farmer's certainty equivalent is employed as objective function, which enables the consideration of not only impacts on average income but also impacts on income variability. The study shows that that the effects of EU crop prices on the optimal management decisions as well as on the farmer's certainty equivalent are much stronger than the effects of climate change. Furthermore, our results indicate that the impacts of income risks on the crop farm's optimal management schemes are of rather low importance. This is due to two major reasons: first, direct payments make up a large percentage of the agricultural income in Switzerland which makes Swiss farmers less vulnerable to market and climate volatility. Second, arable crop farms in Switzerland have by law to cultivate at least four different crops. Due to these diverse cropping systems and high government direct payments risk does neither under climate change, market liberalization nor combinations thereof, play a very decisive role in arable farming in Switzerland.
Extension of a GIS procedure for calculating the RUSLE equation LS factor
Zhang, H. ; Yang, Q. ; Li, R. ; Liu, Q. ; Moore, D. ; He, P. ; Ritsema, C.J. ; Geissen, V. - \ 2013
Computers and Geosciences 52 (2013). - ISSN 0098-3004 - p. 177 - 188.
soil loss equation - digital elevation models - length-slope factor - topographic parameters - drainage networks - terrain models - prediction - erosion - algorithms - extraction
The Universal Soil Loss Equation (USLE) and revised USLE (RUSLE) are often used to estimate soil erosion at regional landscape scales, however a major limitation is the difficulty in extracting the LS factor. The geographic information system-based (GIS-based) methods which have been developed for estimating the LS factor for USLE and RUSLE also have limitations. The unit contributing area-based estimation method (UCA) converts slope length to unit contributing area for considering two-dimensional topography, however is not able to predict the different zones of soil erosion and deposition. The flowpath and cumulative cell length-based method (FCL) overcomes this disadvantage but does not consider channel networks and flow convergence in two-dimensional topography. The purpose of this research was to overcome these limitations and extend the FCL method through inclusion of channel networks and convergence flow. We developed LS-TOOL in Microsoft's.NET environment using C¿ with a user-friendly interface. Comparing the LS factor calculated with the three methodologies (UCA, FCL and LS-TOOL), LS-TOOL delivers encouraging results. In particular, LS-TOOL uses breaks in slope identified from the DEM to locate soil erosion and deposition zones, channel networks and convergence flow areas. Comparing slope length and LS factor values generated using LS-TOOL with manual methods, LS-TOOL corresponds more closely with the reality of the Xiannangou catchment than results using UCA or FCL. The LS-TOOL algorithm can automatically calculate slope length, slope steepness, L factor, S factor, and LS factors, providing the results as ASCII files which can be easily used in some GIS software. This study is an important step forward in conducting more accurate large area erosion evaluation.
Maximizing genetic differentiation in core collections by PCA-based clustering of molecular marker data
Heerwaarden, J. van; Odong, T.L. ; Eeuwijk, F.A. van - \ 2013
Theoretical and Applied Genetics 126 (2013)3. - ISSN 0040-5752 - p. 763 - 772.
population-structure - germplasm collections - model - distance - conservation - accessions - algorithms - management - diversity - richness
Developing genetically diverse core sets is key to the effective management and use of crop genetic resources. Core selection increasingly uses molecular marker-based dissimilarity and clustering methods, under the implicit assumption that markers and genes of interest are genetically correlated. In practice, low marker densities mean that genome-wide correlations are mainly caused by genetic differentiation, rather than by physical linkage. Although of central concern, genetic differentiation per se is not specifically targeted by most commonly employed dissimilarity and clustering methods. Principal component analysis (PCA) on genotypic data is known to effectively describe the inter-locus correlations caused by differentiation, but to date there has been no evaluation of its application to core selection. Here, we explore PCA-based clustering of marker data as a basis for core selection, with the aim of demonstrating its use in capturing genetic differentiation in the data. Using simulated datasets, we show that replacing full-rank genotypic data by the subset of genetically significant PCs leads to better description of differentiation and improves assignment of genotypes to their population of origin. We test the effectiveness of differentiation as a criterion for the formation of core sets by applying a simple new PCA-based core selection method to simulated and actual data and comparing its performance to one of the best existing selection algorithms. We find that although gains in genetic diversity are generally modest, PCA-based core selection is equally effective at maximizing diversity at non-marker loci, while providing better representation of genetically differentiated groups.
Value of information and mobility constraints for sampling with mobile sensors
Ballari, D.E. ; Bruin, S. de; Bregt, A.K. - \ 2012
Computers and Geosciences 49 (2012). - ISSN 0098-3004 - p. 102 - 111.
networks - management - optimization - algorithms - strategies - tracking - fitness - design
Wireless sensor networks (WSNs) play a vital role in environmental monitoring. Advances in mobile sensors offer new opportunities to improve phenomenon predictions by adapting spatial sampling to local variability. Two issues are relevant: which location should be sampled and which mobile sensor should move to do it? This paper proposes a form of adaptive sampling by mobile sensors according to the expected value of information (EVoI) and mobility constraints. EVoI allows decisions to be made about the location to observe. It minimises the expected costs of wrong predictions about a phenomenon using a spatially aggregated EVoI criterion. Mobility constraints allow decisions to be made about which sensor to move. A cost-distance criterion is used to minimise unwanted effects of sensor mobility on the WSN itself, such as energy depletion. We implemented our approach using a synthetic data set, representing a typical monitoring scenario with heterogeneous mobile sensors. To assess the method, it was compared with a random selection of sample locations. The results demonstrate that EVoI enables selecting the most informative locations, while mobility constraints provide the needed context for sensor selection. This paper therefore provides insights about how sensor mobility can be efficiently managed to improve knowledge about a monitored phenomenon.
Metabolite Identification Using Automated Comparison of High-Resolution Multistage Mass Spectral Trees
Rojas-Cherto, M. ; Peironcely, J.E. ; Kasper, P.T. ; Hooft, J.J.J. van der; Vos, R.C.H. de; Vreeken, R. ; Hankemeier, T. ; Reijmers, T. - \ 2012
Analytical Chemistry 84 (2012)13. - ISSN 0003-2700 - p. 5524 - 5534.
development kit cdk - source java library - spectrometry data - chemical markup - fragmentation - web - optimization - instruments - algorithms - ms/ms
Multistage mass spectrometry (MSn) generating so-called spectral trees is a powerful tool in the annotation and structural elucidation of metabolites and is increasingly used in the area of accurate mass LC/MS-based metabolomics to identify unknown, but biologically relevant, compounds. As a consequence, there is a growing need for computational tools specifically designed for the processing and interpretation of MSn data. Here, we present a novel approach to represent and calculate the similarity between high-resolution mass spectral fragmentation trees. This approach can be used to query multiple-stage mass spectra in MS spectral libraries. Additionally the method can be used to calculate structure-spectrum correlations and potentially deduce substructures from spectra of unknown compounds. The approach was tested using two different spectral libraries composed of either human or plant metabolites which currently contain 872 MSn spectra acquired from 549 metabolites using Orbitrap FTMSn. For validation purposes, for 282 of these 549 metabolites, 765 additional replicate MSn spectra acquired with the same instrument were used. Both the dereplication and de novo identification functionalities of the comparison approach are discussed. This novel MSn spectral processing and comparison approach increases the probability to assign the correct identity to an experimentally obtained fragmentation tree. Ultimately, this tool may pave the way for constructing and populating large MSn spectral libraries that can be used for searching and matching experimental MSn spectra for annotation and structural elucidation of unknown metabolites detected in untargeted metabolomics studies.
Two-mode clustering of genotype by trait and genotype by environment data
Hageman, J.A. ; Malosetti, M. ; Eeuwijk, F.A. van - \ 2012
Euphytica 183 (2012)3. - ISSN 0014-2336 - p. 349 - 359.
models - qtl - covariables - algorithms - selection - set
In this paper, we demonstrate the use of two-mode clustering for genotype by trait and genotype by environment data. In contrast to two separate (one mode) clusterings on genotypes or traits/environments, two-mode clustering simultaneously produces homogeneous groups of genotypes and traits/environments. For two-mode clustering, we first scan all two-mode cluster solutions with all possible numbers of clusters using k-means. After deciding on the final numbers of clusters, we continue with a two-mode clustering algorithm based on a genetic algorithm. This ensures optimal solutions even for large data sets. We discuss the application of two-mode clustering to multiple trait data stemming from genomic research on tomatoes as well as an application to multi-environment data on barley
Test of Scintillometer Saturation Correction Methods Using Field Experimental Data
Kleissl, J. ; Hartogensis, O.K. ; Gomez, J.D. - \ 2010
Boundary-Layer Meteorology 137 (2010)3. - ISSN 0006-8314 - p. 493 - 507.
large-aperture scintillometer - atmospheric surface-layer - optical scintillation - sonic anemometer - strong turbulence - inner-scale - fluxes - temperature - algorithms - calibration
Saturation of large aperture scintillometer (LAS) signals can result in sensible heat flux measurements that are biased low. A field study with LASs of different aperture sizes and path lengths was performed to investigate the onset of, and corrections for, signal saturation. Saturation already occurs atC2n˜ 0.074D5/3¿1/3L-8/3, where C2n is the structure parameter of the refractive index, D is the aperture size, ¿ is the wavelength, L is the transect length, which is smaller than theoretically derived saturation limits. At a transect length of 1 km, a height of 2.5 m, and aperture ˜0.15 m the correction factor exceeds 5% already at C2n= 2 × 10-12m-2/3, which will affect many practical applications of scintillometry. The Clifford correction method, which only depends on C2 n and the transect geometry, provides good saturation corrections over the range of conditions observed in our study. The saturation correction proposed by Ochs and Hill results in correction factors that are too small in large saturation regimes. An inner length scale dependence of the saturation correction factor was not observed. Thus for practical applications the Clifford correction method should be applied
Optimization of mobile radioactivity monitoring networks
Heuvelink, G.B.M. ; Jiang, Z. ; Bruin, S. de; Twenhöfel, C.J.W. - \ 2010
International Journal of Geographical Information Science 24 (2010)3. - ISSN 1365-8816 - p. 365 - 382.
spatial sampling design - uncertain environmental variables - constrained optimization - regionalized variables - local estimation - contamination - algorithms - prediction - patterns - schemes
In case of a nuclear accident, decision makers rely on high-resolution and accurate information about the spatial distribution of radioactive contamination surrounding the accident site. However, the static nuclear monitoring networks of many European countries are generally too coarse to provide the desired level of spatial accuracy. In the Netherlands, authorities are considering a strategy in which measurement density is increased during an emergency using complementary mobile measuring devices. This raises the question, where should these mobile devices be placed? This article proposes a geostatistical methodology to optimize the allocation of mobile measurement devices, such that the expected weighted sum of false-positive and false-negative areas (i.e. false classification into safe and unsafe zones) is minimized. Radioactivity concentration is modelled as the sum of a deterministic trend and a zero-mean spatially correlated stochastic residual. The trend is defined as the outcome of a physical atmospheric dispersion model, NPK-PUFF. The residual is characterized by a semivariogram of differences between the outputs of various NPK-PUFF model runs, designed to reflect the effect of uncertainty in NPK-PUFF meteorological inputs (e.g. wind speed, wind direction). Spatial simulated annealing is used to obtain the optimal monitoring design, in which accessibility of sampling sites (e.g. distance to roads) is also considered. Although the methodology is computationally demanding, results are promising and the computational load may be considerably reduced to compute optimal mobile monitoring designs in nearly real time.
Scheduling Internal Audit Activities: A Stochastic Combinatorial Optimization Problem
Rossi, R. ; Tarim, S.A. ; Hnich, B. ; Prestwich, S. ; Karacaer, S. - \ 2010
Journal of Combinatorial Optimization 19 (2010)3. - ISSN 1382-6905 - p. 325 - 346.
The problem of finding the optimal timing of audit activities within an organisation has been addressed by many researchers. We propose a stochastic programming formulation with Mixed Integer Linear Programming (MILP) and Constraint Programming (CP) certainty-equivalent models. In experiments neither approach dominates the other. However, the CP approach is orders of magnitude faster for large audit times, and almost as fast as the MILP approach for small audit times. This work generalises a previous approach by relaxing the assumption of instantaneous audits, and by prohibiting concurrent auditing
Ontwikkelingen plant- en gewasherkenning
Nieuwenhuizen, A.T. ; Evert, F.K. van; Hemming, J. ; Bleeker, P.O. ; Weide, R.Y. van der; Kempenaar, C. - \ 2010
Gewasbescherming 41 (2010)1. - ISSN 0166-6495 - p. 14 - 14.
onkruidbestrijding - opslag (planten) - aardappelen - suikerbieten - schoffelen - rumex obtusifolius - onkruiden - algoritmen - detectie - weed control - volunteer plants - potatoes - sugarbeet - hoeing - weeds - algorithms - detection
Beide technieken zijn toegepast binnen drie case studies: 1) Het bestrijden van aardappelopslag tussen suikerbieten. 2) Het schoffelen in de rij van gezaaide of geplante gewassen. 3) Het bestrijden van ridderzuring in grasland. Voor het bestrijden van aardappelopslag is het Particle filter gebruikt om de kleuren van aardappel en suikerbieten beter van elkaar te onderscheiden. Dit leverde nog geen verbetering op ten opzichte van het huidige adaptieve algoritme. Voor het schoffelen in de rij is het Kalman filter gebruikt om de afstand te schatten waarop de planten zijn gezaaid. Visueel is vastgesteld dat dit een verbeterd detectieresultaat oplevert, omdat het algoritme bijleert en zich dus aanpast aan de omstandigheden. Voor het bestrijden van ridderzuring in grasland is geïnventariseerd welke algoritmes het beste kunnen worden gebruikt Samengevat: adaptieve algoritmes dragen bij aan een betere detectie, maar ze moeten wel verstandig worden ingezet
Approximating a similarity matrix by a latent class model: A reappraisal of additive fuzzy clustering
Braak, C.J.F. ter; Kourmpetis, Y.I.A. ; Kiers, H.A.L. ; Bink, M.C.A.M. - \ 2009
Computational Statistics & Data Analysis 53 (2009)8. - ISSN 0167-9473 - p. 3183 - 3193.
differential evolution - factorization - algorithms - proximity - spaces
Let Q be a given n×n square symmetric matrix of nonnegative elements between 0 and 1, similarities. Fuzzy clustering results in fuzzy assignment of individuals to K clusters. In additive fuzzy clustering, the n×K fuzzy memberships matrix P is found by least-squares approximation of the off-diagonal elements of Q by inner products of rows of P. By contrast, kernelized fuzzy c-means is not least-squares and requires an additional fuzziness parameter. The aim is to popularize additive fuzzy clustering by interpreting it as a latent class model, whereby the elements of Q are modeled as the probability that two individuals share the same class on the basis of the assignment probability matrix P. Two new algorithms are provided, a brute force genetic algorithm (differential evolution) and an iterative row-wise quadratic programming algorithm of which the latter is the more effective. Simulations showed that (1) the method usually has a unique solution, except in special cases, (2) both algorithms reached this solution from random restarts and (3) the number of clusters can be well estimated by AIC. Additive fuzzy clustering is computationally efficient and combines attractive features of both the vector model and the cluster model
Finding Reliable Solutions: Event-Driven Probabilistic Constraint Programming
Tarim, S.A. ; Hnich, B. ; Prestwich, S. ; Rossi, R. - \ 2009
Annals of Operations Research 171 (2009)1. - ISSN 0254-5330 - p. 77 - 99.
optimization - algorithms
Real-life management decisions are usually made in uncertain environments, and decision support systems that ignore this uncertainty are unlikely to provide realistic guidance. We show that previous approaches fail to provide appropriate support for reasoning about reliability under uncertainty. We propose a new framework that addresses this issue by allowing logical dependencies between constraints. Reliability is then defined in terms of key constraints called "events", which are related to other constraints via these dependencies. We illustrate our approach on three problems, contrast it with existing frameworks, and discuss future developments.
Graph-based methods for large-scale protein classification and orthology inference
Kuzniar, A. - \ 2009
University. Promotor(en): Jack Leunissen, co-promotor(en): Roeland van Ham; S. Pongor. - [S.l. : S.n. - ISBN 9789085855019 - 139
bio-informatica - eiwitten - classificatie - algoritmen - grafieken - evolutie - bioinformatics - proteins - classification - algorithms - graphs - evolution
The quest for understanding how proteins evolve and function has been a prominent and costly human endeavor. With advances in genomics and use of bioinformatics tools, the diversity of proteins in present day genomes can now be studied more efficiently than ever before. This thesis describes computational methods suitable for large-scale protein classification of many proteomes of diverse species. Specifically, we focus on methods that combine unsupervised learning (clustering) techniques with the knowledge of molecular phylogenetics, particularly that of orthology. In chapter 1 we introduce the biological context of protein structure, function and evolution, review the state-of-the-art sequence-based protein classification methods, and then describe methods used to validate the predictions. Finally, we present the outline and objectives of this thesis. Evolutionary (phylogenetic) concepts are instrumental in studying subjects as diverse as the diversity of genomes, cellular networks, protein structures and functions, and functional genome annotation. In particular, the detection of orthologous proteins (genes) across genomes provides reliable means to infer biological functions and processes from one organism to another. Chapter 2 evaluates the available computational tools, such as algorithms and databases, used to infer orthologous relationships between genes from fully sequenced genomes. We discuss the main caveats of large-scale orthology detection in general as well as the merits and pitfalls of each method in particular. We argue that establishing true orthologous relationships requires a phylogenetic approach which combines both trees and graphs (networks), reliable species phylogeny, genomic data for more than two species, and an insight into the processes of molecular evolution. Also proposed is a set of guidelines to aid researchers in selecting the correct tool. Moreover, this review motivates further research in developing reliable and scalable methods for functional and phylogenetic classification of large protein collections. Chapter 3 proposes a framework in which various protein knowledge-bases are combined into unique network of mappings (links), and hence allows comparisons to be made between expert curated and fully-automated protein classifications from a single entry point. We developed an integrated annotation
resource for protein orthology, ProGMap (Protein Group Mappings,, to help researchers and database annotators who often need to assess the coherence of proposed annotations and/or group assignments, as well as users of high throughput methodologies (e.g., microarrays or proteomics) who deal with partially annotated genomic data. ProGMap is based on a non-redundant dataset of over 6.6 million protein sequences which is mapped to 240,000 protein group descriptions collected from UniProt, RefSeq, Ensembl, COG, KOG, OrthoMCL-DB, HomoloGene, TRIBES and PIRSF using a fast and fully automated sequence-based mapping approach. The ProGMap database is equipped with a web interface that enables queries to be made using synonymous sequence identifiers, gene symbols, protein functions, and amino acid or nucleotide sequences. It incorporates also services, namely BLAST similarity search and QuickMatch identity search, for finding sequences similar (or identical) to a query sequence, and tools for presenting the results in graphic form. Graphs (networks) have gained an increasing attention in contemporary biology because they have enabled complex biological systems and processes to be modeled and better understood. For example, protein similarity networks constructed of all-versus-all sequence comparisons are frequently used to delineate similarity groups, such as protein families or orthologous groups in comparative genomics studies. Chapter 4.1 presents a benchmark study of freely available graph software used for this purpose. Specifically, the computational complexity of the programs is investigated using both simulated and biological networks. We show that most available software is not suitable for large networks, such as those encountered in large-scale proteome analyzes, because of the high demands on computational resources. To address this, we developed a fast and memory-efficient graph software, netclust (, which can scale to large protein networks, such as those constructed of millions of proteins and sequence similarities, on a standard computer. An extended version of this program called Multi-netclust is presented in chapter 4.2. This tool that can find connected clusters of data presented by different network data sets. It uses user-defined threshold values to combine the data sets in such a way that clusters connected in all or in either of the networks can be retrieved efficiently. Automated protein sequence clustering is an important task in genome annotation projects and phylogenomic studies. During the past years, several protein clustering programs have been developed for delineating protein families or orthologous groups from large sequence collections. However, most of these programs have not been benchmarked systematically, in particular with respect to the trade-off between computational complexity and biological soundness. In chapter 5 we evaluate three best known algorithms on different protein similarity networks and validation (or 'gold' standard) data sets to find out which one can scale to hundreds of proteomes and still delineate high quality similarity groups at the minimum computational cost. For this, a reliable partition-based approach was used to assess the biological soundness of predicted groups using known protein functions, manually curated protein/domain families and orthologous groups available in expert-curated databases. Our benchmark results support the view that a simple and computationally cheap method such as netclust can perform similar to and in cases even better than more sophisticated, yet much more costly methods. Moreover, we introduce an efficient graph-based method that can delineate protein orthologs of hundreds of proteomes into hierarchical similarity groups de novo. The validity of this method is demonstrated on data obtained from 347 prokaryotic proteomes. The resulting hierarchical protein classification is not only in agreement with manually curated classifications but also provides an enriched framework in which the functional and evolutionary relationships between proteins can be studied at various levels of specificity. Finally, in chapter 6 we summarize the main findings and discuss the merits and shortcomings of the methods developed herein. We also propose directions for future research. The ever increasing flood of new sequence data makes it clear that we need improved tools to be able to handle and extract relevant (orthological) information from these protein data. This thesis summarizes these needs and how they can be addressed by the available tools, or be improved by the new tools that were developed in the course of this research.
Estimation of prediction error variances via Monte Carlo sampling methods using different formulations of the prediction error variance
Hickey, J.M. ; Veerkamp, R.F. ; Calus, M.P.L. ; Mulder, H.A. ; Thompson, R. - \ 2009
Genetics, Selection, Evolution 41 (2009). - ISSN 0999-193X - p. 23 - 23.
genetic evaluations - breeding values - model - algorithms - selection - accuracy - trait - bias
Calculation of the exact prediction error variance covariance matrix is often computationally too demanding, which limits its application in REML algorithms, the calculation of accuracies of estimated breeding values and the control of variance of response to selection. Alternatively Monte Carlo sampling can be used to calculate approximations of the prediction error variance, which converge to the true values if enough samples are used. However, in practical situations the number of samples, which are computationally feasible, is limited. The objective of this study was to compare the convergence rate of different formulations of the prediction error variance calculated using Monte Carlo sampling. Four of these formulations were published, four were corresponding alternative versions, and two were derived as part of this study. The different formulations had different convergence rates and these were shown to depend on the number of samples and on the level of prediction error variance. Four formulations were competitive and these made use of information on either the variance of the estimated breeding value and on the variance of the true breeding value minus the estimated breeding value or on the covariance between the true and estimated breeding values
Cost-based Filtering Techniques for Stochastic Inventory Control under Service Level Constraints
Tarim, S.A. ; Hnich, B. ; Rossi, R. ; Prestwich, S. - \ 2009
Constraints 14 (2009)2. - ISSN 1383-7133 - p. 137 - 176.
lot-sizing problem - demand - algorithms - systems
This paper(1) considers a single product and a single stocking location production/inventory control problem given a non-stationary stochastic demand. Under a widely-used control policy for this type of inventory system, the objective is to find the optimal number of replenishments, their timings and their respective order-up-to-levels that meet customer demands to a required service level. We extend a known CP approach for this problem using three cost-based filtering methods. Our approach can solve to optimality instances of realistic size much more efficiently than previous approaches, often with no search effort at all.
A Global Chance-Constraint for Stochastic Inventory Systems under Service Level Constraints
Rossi, R. ; Tarim, S.A. ; Hnich, B. ; Prestwich, S. - \ 2008
Constraints 13 (2008)4. - ISSN 1383-7133 - p. 490 - 517.
lot-sizing problem - algorithms - model - management - demand
We consider a class of production/inventory control problems that has a single product and a single stocking location, for which a stochastic demand with a known non-stationary probability distribution is given. Under the widely-known replenishment cycle policy the problem of computing policy parameters under service level constraints has been modeled using various techniques. Tarim and Kingsman introduced a modeling strategy that constitutes the state-of-the-art approach for solving this problem. In this paper we identify two sources of approximation in Tarim and Kingsman's model and we propose an exact stochastic constraint programming approach. We build our approach on a novel concept, global chance-constraints, which we introduce in this paper. Solutions provided by our exact approach are employed to analyze the accuracy of the model developed by Tarim and Kingsman.
Differential Evolution Markov Chain with snooker updater and fewer chains
Braak, C.J.F. ter; Vrugt, J.A. - \ 2008
Statistics and Computing 18 (2008)4. - ISSN 0960-3174 - p. 435 - 446.
monte-carlo - algorithms - optimization - convergence - spaces - mcmc
Differential Evolution Markov Chain (DE-MC) is an adaptive MCMC algorithm, in which multiple chains are run in parallel. Standard DE-MC requires at least N=2d chains to be run in parallel, where d is the dimensionality of the posterior. This paper extends DE-MC with a snooker updater and shows by simulation and real examples that DE-MC can work for d up to 50–100 with fewer parallel chains (e.g. N=3) by exploiting information from their past by generating jumps from differences of pairs of past states. This approach extends the practical applicability of DE-MC and is shown to be about 5–26 times more efficient than the optimal Normal random walk Metropolis sampler for the 97.5% point of a variable from a 25–50 dimensional Student t 3 distribution. In a nonlinear mixed effects model example the approach outperformed a block-updater geared to the specific features of the model
How non-zero initial conditions affect the minimality of linear discrete-time systems
Willigenburg, L.G. van; Koning, W.L. de - \ 2008
International Journal of Systems Science 39 (2008)10. - ISSN 0020-7721 - p. 969 - 983.
order compensators - realization - algorithms
From the state-space approach to linear systems, promoted by Kalman, we learned that minimality is equivalent with reachability together with observability. Our past research on optimal reduced-order LQG controller synthesis revealed that if the initial conditions are non-zero, minimality is no longer equivalent with reachability together with observability. In the behavioural approach to linear systems promoted by Willems, that consider systems as exclusion laws, minimality is equivalent with observability. This article describes and explains in detail these apparently fundamental differences. Out of the discussion, the system properties weak reachability or excitability, and the dual property weak observability emerge. Weak reachability is weaker than reachability and becomes identical only if the initial conditions are empty or zero. Weak reachability together with observability is equivalent with minimality. Taking the behavioural systems point of view, minimality becomes equivalent with observability when the linear system is time invariant. This article also reveals the precise influence of a possibly stochastic initial state on the dimension of a minimal realisation. The issues raised in this article become especially apparent if linear time-varying systems (controllers) with time-varying dimensions are considered. Systems with time-varying dimensions play a major role in the realisation theory of computer algorithms. Moreover, they provide minimal realisations with smaller dimensions. Therefore, the results of this article are of practical importance for the minimal realisation of discrete-time (digital) controllers and computer algorithms with nonzero initial conditions. Theoretically, the results of this article generalise the minimality property to linear systems with time-varying dimensions and non-zero initial conditions
Check title to add to marked list
<< previous | next >>

Show 20 50 100 records per page

Please log in to use this service. Login as Wageningen University & Research user or guest user in upper right hand corner of this page.