|Title||Incorporating limited field operability and legacy soil samples in a hypercube sampling design for digital soil mapping|
|Author(s)||Stumpf, Felix; Schmidt, Karsten; Behrens, Thorsten; Schönbrodt-Stitt, Sarah; Buzzo, Giovanni; Dumperth, Christian; Wadoux, Alexandre; Xiang, Wei; Scholten, Thomas|
|Source||Journal of Plant Nutrition and Soil Science 179 (2016)4. - ISSN 1436-8730 - p. 499 - 509.|
|Department(s)||Soil Geography and Landscape|
|Publication type||Refereed Article in a scientific journal|
|Keyword(s)||Conditioned Latin Hypercube Sampling - Digital soil mapping - Field accessibility - Legacy soil samples - Random forest - Sample set size - Three Gorges Reservoir Area|
Most calibration sampling designs for Digital Soil Mapping (DSM) demarcate spatially distinct sample sites. In practical applications major challenges are often limited field accessibility and the question on how to integrate legacy soil samples to cope with usually scarce resources for field sampling and laboratory analysis. The study focuses on the development and application of an efficiency improved DSM sampling design that (1) applies an optimized sample set size, (2) compensates for limited field accessibility, and (3) enables the integration of legacy soil samples. The proposed sampling design represents a modification of conditioned Latin Hypercube Sampling (cLHS), which originally returns distinct sample sites to optimally cover a soil related covariate space and to preserve the correlation of the covariates in the sample set. The sample set size was determined by comparing multiple sample set sizes of original cLHS sets according to their representation of the covariate space. Limited field accessibility and the integration of legacy samples were incorporated by providing alternative sample sites to replace the original cLHS sites. We applied the modified cLHS design (cLHSadapt) in a small catchment (4.2 km2) in Central China to model topsoil sand fractions using Random Forest regression (RF). For evaluating the proposed approach, we compared cLHSadapt with the original cLHS design (cLHSorig). With an optimized sample set size n = 30, the results show a similar representation of the cLHS covariate space between cLHSadapt and cLHSorig, while the correlation between the covariates is preserved (r = 0.40 vs. r = 0.39). Furthermore, we doubled the sample set size of cLHSadapt by adding available legacy samples (cLHSadapt+) and compared the prediction accuracies. Based on an external validation set cLHSval (n = 20), the coefficient of determination (R2) of the cLHSadapt predictions range between 0.59 and 0.71 for topsoil sand fractions. The R2-values of the RF predictions based on cLHSadapt+, using additional legacy samples, are marginally increased on average by 5%.