Staff Publications

Staff Publications

  • external user (warningwarning)
  • Log in as
  • language uk
  • About

    'Staff publications' is the digital repository of Wageningen University & Research

    'Staff publications' contains references to publications authored by Wageningen University staff from 1976 onward.

    Publications authored by the staff of the Research Institutes are available from 1995 onwards.

    Full text documents are added when available. The database is updated daily and currently holds about 240,000 items, of which 72,000 in open access.

    We have a manual that explains all the features 

Record number 559483
Title A note on knowledge discovery and machine learning in digital soil mapping
Author(s) Wadoux, Alexandre M.J.C.; Samuel-Rosa, Alessandro; Poggio, Laura; Mulder, Vera Leatitia
Source European Journal of Soil Science 71 (2020)2. - ISSN 1351-0754 - p. 133 - 136.
DOI https://doi.org/10.1111/ejss.12909
Department(s) Soil Geography and Landscape
ISRIC - World Soil Information
PE&RC
Publication type Refereed Article in a scientific journal
Publication year 2020
Keyword(s) mapping - pedometrics - random forest - soil science - variable selection
Abstract

In digital soil mapping, machine learning (ML) techniques are being used to infer a relationship between a soil property and the covariates. The information derived from this process is often translated into pedological knowledge. This mechanism is referred to as knowledge discovery. This study shows that knowledge discovery based on ML must be treated with caution. We show how pseudo-covariates can be used to accurately predict soil organic carbon in a hypothetical case study. We demonstrate that ML methods can find relevant patterns even when the covariates are meaningless and not related to soil-forming factors and processes. We argue that pattern recognition for prediction should not be equated with knowledge discovery. Knowledge discovery requires more than the recognition of patterns and successful prediction. It requires the pre-selection and preprocessing of pedologically relevant environmental covariates and the posterior interpretation and evaluation of the recognized patterns. We argue that important ML covariates could serve the purpose of providing elements to postulate hypotheses about soil processes that, once validated through experiments, could result in new pedological knowledge. Highlights: We discuss the rationale of knowledge discovery based on the most important machine learning covariates We use pseudo-covariates to predict topsoil organic carbon with random forest Soil organic carbon was accurately predicted in a hypothetical case study Pattern recognition by random forest should not be equated to knowledge discovery.

Comments
There are no comments yet. You can post the first one!
Post a comment
 
Please log in to use this service. Login as Wageningen University & Research user or guest user in upper right hand corner of this page.