Staff Publications

Staff Publications

  • external user (warningwarning)
  • Log in as
  • language uk
  • About

    'Staff publications' is the digital repository of Wageningen University & Research

    'Staff publications' contains references to publications authored by Wageningen University staff from 1976 onward.

    Publications authored by the staff of the Research Institutes are available from 1995 onwards.

    Full text documents are added when available. The database is updated daily and currently holds about 240,000 items, of which 72,000 in open access.

    We have a manual that explains all the features 

    Records 1 - 20 / 195

    • help
    • print

      Print search results

    • export

      Export search results

    Check title to add to marked list
    Estimating quality of life dimensions from urban spatial pattern metrics
    Sapena, Marta ; Wurm, Michael ; Taubenböck, Hannes ; Tuia, Devis ; Ruiz, Luis A. - \ 2021
    Computers, Environment and Urban Systems 85 (2021). - ISSN 0198-9715
    The spatial structure of urban areas plays a major role in the daily life of dwellers. The current policy framework to ensure the quality of life of inhabitants leaving no one behind, leads decision-makers to seek better-informed choices for the sustainable planning of urban areas. Thus, a better understanding between the spatial structure of cities and their socio-economic level is of crucial relevance. Accordingly, the purpose of this paper is to quantify this two-way relationship. Therefore, we measured spatial patterns of 31 cities in North Rhine-Westphalia, Germany. We rely on spatial pattern metrics derived from a Local Climate Zone classification obtained by fusing remote sensing and open GIS data with a machine learning approach. Based upon the data, we quantified the relationship between spatial pattern metrics and socio-economic variables related to ‘education’, ‘health’, ‘living conditions’, ‘labor’, and ‘transport’ by means of multiple linear regression models, explaining the variability of the socio-economic variables from 43% up to 82%. Additionally, we grouped cities according to their level of ‘quality of life’ using the socio-economic variables, and found that the spatial pattern of low-dense built-up types was different among socio-economic groups. The proposed methodology described in this paper is transferable to other datasets, levels, and regions. This is of great potential, due to the growing availability of open statistical and satellite data and derived products. Moreover, we discuss the limitations and needed considerations when conducting such studies.
    RSVQA: Visual Question Answering for Remote Sensing Data
    Lobry, Sylvain ; Marcos, Diego ; Murray, Jesse ; Tuia, Devis - \ 2020
    IEEE Transactions on Geoscience and Remote Sensing 58 (2020)12. - ISSN 0196-2892 - p. 8555 - 8566.
    This article introduces the task of visual question answering for remote sensing data (RSVQA). Remote sensing images contain a wealth of information, which can be useful for a wide range of tasks, including land cover classification, object counting, or detection. However, most of the available methodologies are task-specific, thus inhibiting generic and easy access to the information contained in remote sensing data. As a consequence, accurate remote sensing product generation still requires expert knowledge. With RSVQA, we propose a system to extract information from remote sensing data that is accessible to every user: we use questions formulated in natural language and use them to interact with the images. With the system, images can be queried to obtain high-level information specific to the image content or relational dependencies between objects visible in the images. Using an automatic method introduced in this article, we built two data sets (using low- and high-resolution data) of image/question/answer triplets. The information required to build the questions and answers is queried from OpenStreetMap (OSM). The data sets can be used to train (when using supervised methods) and evaluate models to solve the RSVQA task. We report the results obtained by applying a model based on convolutional neural networks (CNNs) for the visual part and a recurrent neural network (RNN) for the natural language part of this task. The model is trained on the two data sets, yielding promising results in both cases.
    AIDE: Accelerating image-based ecological surveys with interactive machine learning
    Kellenberger, Benjamin ; Tuia, Devis ; Morris, Dan - \ 2020
    Methods in Ecology and Evolution 11 (2020)12. - ISSN 2041-210X - p. 1716 - 1727.
    applied ecology - conservation - monitoring (population ecology) - population ecology - statistics - surveys

    Ecological surveys increasingly rely on large-scale image datasets, typically terabytes of imagery for a single survey. The ability to collect this volume of data allows surveys of unprecedented scale, at the cost of expansive volumes of photo-interpretation labour. We present Annotation Interface for Data-driven Ecology (AIDE), an open-source web framework designed to alleviate the task of image annotation for ecological surveys. AIDE employs an easy-to-use and customisable labelling interface that supports multiple users, database storage and scalability to the cloud and/or multiple machines. Moreover, AIDE closely integrates users and machine learning models into a feedback loop, where user-provided annotations are employed to re-train the model, and the latter is applied over unlabelled images to e.g. identify wildlife. These predictions are then presented to the users in optimised order, according to a customisable active learning criterion. AIDE has a number of deep learning models built-in, but also accepts custom model implementations. Annotation Interface for Data-driven Ecology has the potential to greatly accelerate annotation tasks for a wide range of researches employing image data. AIDE is open-source and can be downloaded for free at

    Better Generic Objects Counting When Asking Questions to Images : A Multitask Approach for Remote Sensing Visual Question Answering
    Lobry, Sylvain ; Marcos, Diego ; Kellenberger, Benjamin ; Tuia, Devis - \ 2020
    In: ISPRS Annals of the Photogrammetry, Remote Sensing and Spatial Information Sciences. - - p. 1021 - 1027.
    Convolution Neural Networks - Deep learning - Natural language - Recurrent Neural Networks - Regression - Remote sensing - Visual Question Answering

    Visual Question Answering for Remote Sensing (RSVQA) aims at extracting information from remote sensing images through queries formulated in natural language. Since the answer to the query is also provided in natural language, the system is accessible to non-experts, and therefore dramatically increases the value of remote sensing images as a source of information, for example for journalism purposes or interactive land planning. Ideally, an RSVQA system should be able to provide an answer to questions that vary both in terms of topic (presence, localization, counting) and image content. However, aiming at such flexibility generates problems related to the variability of the possible answers. A striking example is counting, where the number of objects present in a remote sensing image can vary by multiple orders of magnitude, depending on both the scene and type of objects. This represents a challenge for traditional Visual Question Answering (VQA) methods, which either become intractable or result in an accuracy loss, as the number of possible answers has to be limited. To this end, we introduce a new model that jointly solves a classification problem (which is the most common approach in VQA) and a regression problem (to answer numerical questions more precisely). An evaluation of this method on the RSVQA dataset shows that this finer numerical output comes at the cost of a small loss of performance on non-numerical questions.

    Concept Discovery for The Interpretation of Landscape Scenicness
    Arendsen, Pim ; Marcos, Diego ; Tuia, Devis - \ 2020
    Machine Learning and Knowledge Extraction 2 (2020)4. - ISSN 2504-4990 - p. 397 - 413.
    In this paper, we study how to extract visual concepts to understand landscape scenicness. Using visual feature representations from a Convolutional Neural Network (CNN), we learn a number of Concept Activation Vectors (CAV) aligned with semantic concepts from ancillary datasets. These concepts represent objects, attributes or scene categories that describe outdoor images. We then use these CAVs to study their impact on the (crowdsourced) perception of beauty of landscapes in the United Kingdom. Finally, we deploy a technique to explore new concepts beyond those initially available in the ancillary dataset: Using a semi-supervised manifold alignment technique, we align the CNN image representation to a large set of word embeddings, therefore giving access to entire dictionaries of concepts. This allows us to obtain a list of new concept candidates to improve our understanding of the elements that contribute the most to the perception of scenicness. We do this without the need for any additional data by leveraging the commonalities in the visual and word vector spaces. Our results suggest that new and potentially useful concepts can be discovered by leveraging neighbourhood structures in the word vector spaces
    Data and code of the paper "Deploying machine learning to assist digital humanitarians: making image annotation in OpenStreetMap more efficient" submitted to IJGIS
    Vargas Munoz, John ; Tuia, Devis ; Falcão, Alexandre X. - \ 2020
    University of Campinas
    image annotation algorithms - OpenStreetMap data
    A deep learning framework for matching of SAR and optical imagery
    Hughes, Lloyd Haydn ; Marcos, Diego ; Lobry, Sylvain ; Tuia, Devis ; Schmitt, Michael - \ 2020
    ISPRS Journal of Photogrammetry and Remote Sensing 169 (2020). - ISSN 0924-2716 - p. 166 - 179.
    SAR and optical imagery provide highly complementary information about observed scenes. A combined use of these two modalities is thus desirable in many data fusion scenarios. However, any data fusion task requires measurements to be accurately aligned. While for both data sources images are usually provided in a georeferenced manner, the geo-localization of optical images is often inaccurate due to propagation of angular measurement errors. Many methods for the matching of homologous image regions exist for both SAR and optical imagery, however, these methods are unsuitable for SAR-optical image matching due to significant geometric and radiometric differences between the two modalities. In this paper, we present a three-step framework for sparse image matching of SAR and optical imagery, whereby each step is encoded by a deep neural network. We first predict regions in each image which are deemed most suitable for matching. A correspondence heatmap is then generated through a multi-scale, feature-space cross-correlation operator. Finally, outliers are removed by classifying the correspondence surface as a positive or negative match. Our experiments show that the proposed approach provides a substantial improvement over previous methods for SAR-optical image matching and can be used to register even large-scale scenes. This opens up the possibility of using both types of data jointly, for example for the improvement of the geo-localization of optical satellite imagery or multi-sensor stereogrammetry.
    Deploying machine learning to assist digital humanitarians: making image annotation in OpenStreetMap more efficient
    Vargas Muñoz, John E. ; Tuia, Devis ; Falcão, Alexandre X. - \ 2020
    International Journal of Geographical Information Science (2020). - ISSN 1365-8816 - 21 p.
    Locating populations in rural areas of developing countries has attracted the attention of humanitarian mapping projects since it is important to plan actions that affect vulnerable areas. Recent efforts have tackled this problem as the detection of buildings in aerial images. However, the quality and the amount of rural building annotated data in open mapping services like OpenStreetMap (OSM) is not sufficient for training accurate models for such detection. Although these methods have the potential of aiding in the update of rural building information, they are not accurate enough to automatically update the rural building maps. In this paper, we explore a human-computer interaction approach and propose an interactive method to support and optimize the work of volunteers in OSM. The user is asked to verify/correct the annotation of selected tiles during several iterations and therefore improving the model with the new annotated data. The experimental results, with simulated and real user annotation corrections, show that the proposed method greatly reduces the amount of data that the volunteers of OSM need to verify/correct. The proposed methodology could benefit humanitarian mapping projects, not only by making more efficient the process of annotation but also by improving the engagement of volunteers.
    OpenStreetMap: Challenges and Opportunities in Machine Learning and Remote Sensing
    Vargas Munoz, John E. ; Srivastava, Shivangi ; Tuia, Devis ; Falcao, Alexandre X. - \ 2020
    IEEE Geoscience and Remote Sensing Magazine (2020). - ISSN 2473-2397
    Optimal transport for multi-source domain adaptation under target shift
    Redko, Ievgen ; Courty, Nicolas ; Flamary, Rémi ; Tuia, Devis - \ 2020

    In this paper, we tackle the problem of reducing discrepancies between multiple domains, i.e. multi-source domain adaptation, and consider it under the target shift assumption: in all domains we aim to solve a classification problem with the same output classes, but with different labels proportions. This problem, generally ignored in the vast majority of domain adaptation papers, is nevertheless critical in real-world applications, and we theoretically show its impact on the success of the adaptation. Our proposed method is based on optimal transport, a theory that has been successfully used to tackle adaptation problems in machine learning. The introduced approach, Joint Class Proportion and Optimal Transport (JCPOT), performs multi-source adaptation and target shift correction simultaneously by learning the class probabilities of the unlabeled target sample and the coupling allowing to align two (or more) probability distributions. Experiments on both synthetic and real-world data (satellite image pixel classification) task show the superiority of the proposed method over the state-of-the-art.

    Detecting Unsigned Physical Road Incidents from Driver-view Images
    Levering, Alex ; Tomko, Martin ; Tuia, Devis ; Khoshelham, Kourosh - \ 2020
    IEEE Transactions on Intelligent Vehicles (2020). - ISSN 2379-8858

    Safety on roads is of uttermost importance, especially in the context of autonomous vehicles. A critical need is to detect and communicate disruptive incidents early and effectively. In this paper we propose a system based on an off-the-shelf deep neural network architecture that is able to detect and recognize types of unsigned (non-placarded, such as traffic signs), physical (visible in images) road incidents. We develop a taxonomy for unsigned physical incidents to provide a means of organizing and grouping related incidents. After selecting eight target types of incidents, we collect a dataset of twelve thousand images gathered from publicly-available web sources. We subsequently fine-tune a convolutional neural network to recognize the eight types of road incidents. The proposed model is able to recognize incidents with a high level of accuracy (higher than 90%). We further show that while our system generalizes well across spatial context by training a classifier on geostratified data in the United Kingdom (with an accuracy of over 90%), the translation to visually less similar environments requires spatially distributed data collection.

    Defining and spatially modelling cultural ecosystem services using crowdsourced data
    Havinga, Ilan ; Bogaart, Patrick W. ; Hein, Lars ; Tuia, Devis - \ 2020
    Ecosystem Services 43 (2020). - ISSN 2212-0416
    Cultural ecosystem services (CES) are some of the most valuable contributions of ecosystems to human well-being. Nevertheless, these services are often underrepresented in ecosystem service assessments. Defining CES for the purposes of spatial quantification has been challenging because it has been difficult to spatially model CES. However, rapid increases in mobile network connectivity and the use of social media have generated huge amounts of crowdsourced data. This offers an opportunity to define and spatially quantify CES. We inventoried established CES conceptualisations and sources of crowdsourced data to propose a CES definition and typology for spatial quantification. Furthermore, we present the results of three spatial models employing crowdsourced data to measure CES on Texel, a coastal island in the Netherlands. Defining CES as information-flows best enables service quantification. A general typology of eight services is proposed. The spatial models produced distributions consistent with known areas of cultural importance on Texel. However, user representativeness and measurement uncertainties affect our results. Ethical considerations must also be taken into account. Still, crowdsourced data is a valuable source of information to define and model CES due to the level of detail available. This can encourage the representation of CES in ecosystem service assessments.
    Fine-grained landuse characterization using ground-based pictures: a deep learning solution based on globally available data
    Srivastava, Shivangi ; Vargas Muñoz, John E. ; Lobry, Sylvain ; Tuia, Devis - \ 2020
    International Journal of Geographical Information Science 34 (2020)6. - ISSN 1365-8816 - p. 1117 - 1136.
    We study the problem of landuse characterization at the urban-object level using deep learning algorithms. Traditionally, this task is performed by surveys or manual photo interpretation, which are expensive and difficult to update regularly. We seek to characterize usages at the single object level and to differentiate classes such as educational institutes, hospitals and religious places by visual cues contained in side-view pictures from Google Street View (GSV). These pictures provide geo-referenced information not only about the material composition of the objects but also about their actual usage, which otherwise is difficult to capture using other classical sources of data such as aerial imagery. Since the GSV database is regularly updated, this allows to consequently update the landuse maps, at lower costs than those of authoritative surveys. Because every urban-object is imaged from a number of viewpoints with street-level pictures, we propose a deep-learning based architecture that accepts arbitrary number of GSV pictures to predict the fine-grained landuse classes at the object level. These classes are taken from OpenStreetMap. A quantitative evaluation of the area of Île-de-France, France shows that our model outperforms other deep learning-based methods, making it a suitable alternative to manual landuse characterization.
    Zoom In, Zoom Out: Injecting Scale Invariance into Landuse Classification CNNs
    Murray, Jesse ; Marcos, Diego ; Tuia, Devis - \ 2019
    In: IGARSS 2019 - 2019 IEEE International Geoscience and Remote Sensing Symposium. - IEEE - ISBN 9781538691557 - p. 5240 - 5243.
    We propose a Convolutional Neural Network (CNN), which encodes local scale invariance and equivariance in a multiresolution, multi-sensor image classification task. We show that the locally scale invariant model achieves results that are in line with state-of-the-art. The scale invariant and equivariant models also prove to be more robust to reductions in training data and number of filters used in each convolutional layer. These results demonstrate the benefit of disentangling scale within the learned features of CNNs, in particular when processing multi-resolution imagery. This is beneficial in the two studied cases: when training data is limited, or when the number of model parameters must be kept to a minimum.
    Semantically Interpretable Activation Maps: what-where-how explanations within CNNs
    Marcos Gonzalez, D. ; Lobry, Sylvain ; Tuia, D. - \ 2019
    arXiv - 9 p.
    A main issue preventing the use of Convolutional Neural Networks (CNN) in end user applications is the low level of transparency in the decision process. Previous work on CNN interpretability has mostly focused either on localizing the regions of the image that contribute to the result or on building an external model that generates plausible explanations. However, the former does not provide any semantic information and the latter does not guarantee the faithfulness of the explanation. We propose an intermediate representation composed of multiple Semantically Interpretable Activation Maps (SIAM) indicating the presence of predefined attributes at different locations of the image. These attribute maps are then linearly combined to produce the final output. This gives the user insight into what the model has seen, where, and a final output directly linked to this information in a comprehensive and interpretable way. We test the method on the task of landscape scenicness (aesthetic value) estimation, using an intermediate representation of 33 attributes from the SUN Attributes database. The results confirm that SIAM makes it possible to understand what attributes in the image are contributing to the final score and where they are located. Since it is based on learning from multiple tasks and datasets, SIAM improve the explanability of the prediction without additional annotation efforts or computational overhead at inference time, while keeping good performances on both the final and intermediate tasks.
    Interactive Coconut Tree Annotation Using Feature Space Projections
    Vargas-Munoz, John E. ; Zhou, Ping ; Falcao, Alexandre X. ; Tuia, Devis - \ 2019
    In: IGARSS 2019 - 2019 IEEE International Geoscience and Remote Sensing Symposium. - IEEE - ISBN 9781538691557 - p. 5718 - 5721.
    The detection and counting of coconut trees in aerial images are important tasks for environment monitoring and post-disaster assessment. Recent deep-learning-based methods can attain accurate results, but they require a reasonably high number of annotated training samples. In order to obtain such large training sets with considerably reduced human effort, we present a semi-automatic sample annotation method based on the 2D t-SNE projection of the sample feature space. The proposed approach can facilitate the construction of effective training sets more efficiently than using the traditional manual annotation, as shown in our experimental results with VHR images from the Kingdom of Tonga.
    Visual Question Answering From Remote Sensing Images
    Lobry, Sylvain ; Murray, Jesse ; Marcos, Diego ; Tuia, Devis - \ 2019
    In: 2019 IEEE International Geoscience & Remote Sensing Symposium. - IEEE - ISBN 9781538691557 - p. 4951 - 4954.
    Remote sensing images carry wide amounts of information beyond land cover or land use. Images contain visual and structural information that can be queried to obtain high level information about specific image content or relational dependencies between the objects sensed. This paper explores the possibility to use questions formulated in natural language as a generic and accessible way to extract this type of information from remote sensing images, i.e. visual question answering. We introduce an automatic way to create a dataset using OpenStreetMap 1 data and present some preliminary results. Our proposed approach is based on deep learning, and is trained using our new dataset.
    Adaptive Compression-based Lifelong Learning
    Srivastava, S. ; Berman, M. ; Blaschko, M.B. ; Tuia, D. - \ 2019
    In: Proceedings of the British Machine Vision Conference (BMVC). - BMVA Press - 13 p.
    The problem of a deep learning model losing performance on a previously learned task when fine-tuned to a new one is a phenomenon known as Catastrophic forgetting. There are two major ways to mitigate this problem: either preserving activations of the initial network during training with a new task; or restricting the new network activations to remain close to the initial ones. The latter approach falls under the denomination of lifelong learning, where the model is updated in a way that it performs well on both old and new tasks, without having access to the old task’s training samples anymore. Recently, approaches like pruning networks for freeing network capacity during s-quential learning of tasks have been gaining in popularity. Such approaches allow learning small networks while making redundant parameters available for the next tasks. The common problem encountered with these approaches is that the pruning percentage is hard-coded, irrespective of the number of samples, of the complexity of the learning task and of the number of classes in the dataset. We propose a method based on Bayesian optimization to perform adaptive compression/pruning of the network and show its effectiveness in lifelong learning. Our method learns to perform heavy pruning for small and/or simple datasets while using milder compression rates for large and/or complex data. Experiments on classification and semantic segmentation demonstrate the applicability of learning network compression, where we are able to effectively preserve performances along sequences of tasks of varying complexity.
    Half a Percent of Labels is Enough: Efficient Animal Detection in UAV Imagery Using Deep CNNs and Active Learning
    Kellenberger, Benjamin ; Marcos, Diego ; Lobry, Sylvain ; Tuia, Devis - \ 2019
    IEEE Transactions on Geoscience and Remote Sensing 57 (2019)12. - ISSN 0196-2892 - p. 9524 - 9533.
    We present an Active Learning (AL) strategy for reusing a deep Convolutional Neural Network (CNN)-based object detector on a new data set. This is of particular interest for wildlife conservation: given a set of images acquired with an Unmanned Aerial Vehicle (UAV) and manually labeled ground truth, our goal is to train an animal detector that can be reused for repeated acquisitions, e.g., in follow-up years. Domain shifts between data sets typically prevent such a direct model application. We thus propose to bridge this gap using AL and introduce a new criterion called Transfer Sampling (TS). TS uses Optimal Transport (OT) to find corresponding regions between the source and the target data sets in the space of CNN activations. The CNN scores in the source data set are used to rank the samples according to their likelihood of being animals, and this ranking is transferred to the target data set. Unlike conventional AL criteria that exploit model uncertainty, TS focuses on very confident samples, thus allowing quick retrieval of true positives in the target data set, where positives are typically extremely rare and difficult to find by visual inspection. We extend TS with a new window cropping strategy that further accelerates sample retrieval. Our experiments show that with both strategies combined, less than half a percent of oracle-provided labels are enough to find almost 80% of the animals in challenging sets of UAV images, beating all baselines by a margin.
    When a Few Clicks Make All the Difference: Improving Weakly-Supervised Wildlife Detection in UAV Images
    Kellenberger, B.A. ; Marcos Gonzalez, D. ; Tuia, D. - \ 2019
    In: 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW) Long Beach, CA, USA : Institute of Electrical and Electronics Engineers Inc. - p. 1414 - 1422.
    Automated object detectors on Unmanned Aerial Vehi-cles (UAVs) are increasingly employed for a wide rangeof tasks. However, to be accurate in their specific taskthey need expensive ground truth in the form of boundingboxes or positional information. Weakly-Supervised Ob-ject Detection (WSOD) overcomes this hindrance by local-izing objects with only image-level labels that are faster andcheaper to obtain, but is not on par with fully-supervisedmodels in terms of performance. In this study we proposeto combine both approaches in a model that is principallyapt for WSOD, but receives full position ground truth fora small number of images. Experiments show that withjust 1% of densely annotated images, but simple image-level counts as remaining ground truth, we effectively matchthe performance of fully-supervised models on a challeng-ing dataset with scarcely occurring wildlife on UAV imagesfrom the African savanna. As a result, with a very limitedamount of precise annotations our model can be trainedwith ground truth that is orders of magnitude cheaper andfaster to obtain while still providing the same detection per-formance.
    Check title to add to marked list
    << previous | next >>

    Show 20 50 100 records per page

    Please log in to use this service. Login as Wageningen University & Research user or guest user in upper right hand corner of this page.