Feature filtering and selection for dry matter estimation on perennial ryegrass: A case study of vegetation indices
Alckmin, G.T. ; Kooistra, L. ; Lucieer, A. ; Rawnsley, R. - \ 2019
In: ISPRS Geospatial Week 2019, 10–14 June 2019, Enschede, The Netherlands. - ISPRS (International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences - ISPRS Archives ) - p. 1827 - 1831.
Biomass - Collinearity - Dry Matter - Feature Selection - Machine Learning - Pasture - Perennial Ryegrass - Vegetation Indices
Vegetation indices (VIs) have been extensively employed as a feature for dry matter (DM) estimation. During the past five decades more than a hundred vegetation indices have been proposed. Inevitably, the selection of the optimal index or subset of indices is not trivial nor obvious. This study, performed on a year-round observation of perennial ryegrass (n Combining double low line 900), indicates that for this response variable (i.e. kg.DM.ha−1), more than 80% of indices present a high degree of collinearity (correlation > |0.8|.) Additionally, the absence of an established workflow for feature selection and modelling is a handicap when trying to establish meaningful relations between spectral data and biophysical/biochemical features. Within this case study, an unsupervised and supervised filtering process is proposed to an initial dataset of 97 VIs. This research analyses the effects of the proposed filtering and feature selection process to the overall stability of final models. Consequently, this analysis provides a straightforward framework to filter and select VIs. This approach was able to provide a reduced feature set for a robust model and to quantify trade-offs between optimal models (i.e. lowest root mean square error – RMSE Combining double low line 412.27 kg.DM.ha−1) and tolerable models (with a smaller number of features – 4 VIs and within 10% of the lowest RMSE.).
Species categorization via MicroRNAs based on 3’UTR target sites using sequence features
Yousef, Malik ; Levy, Dalit ; Allmer, Jens - \ 2018
In: Proceedings of the 11th International Joint Conference on Biomedical Engineering Systems and Technologies. - SciTePress - ISBN 9789897582806 - p. 112 - 118.
Categorization - Machine Learning - MicroRNA - MicroRNA Target - Sequence Features
Proteins define phenotypes and their dysregulation leads to diseases. Post-translational regulation of protein abundance can be achieved by microRNAs (miRNAs). Therefore studying this method of gene regulation is of high importance. MicroRNAs interact with their target messenger RNA via hybridization within a specialized molecular framework. Many miRNAs and their targets have been identified and they are listed in various databases like miRTarBase. The experimental identification of functional miRNA-mRNA pairs is difficult and, therefore, they are detected computationally which is complicated due to missing negative data. Machine learning has been used for miRNA and target detection and many features have been described for miRNAs and miRNA:mRNA target duplexes generally on a per species basis. However, many claims of cross-kingdom regulation via miRNAs have been made and, therefore, we were interested whether it is possible to differentiate among species based on the target sequence in the mRNA alone. Thus, we investigated whether miRNA targets sites within the 3’UTR can be differentiated between species based on k-mer features only. Target information of one species was used as positive examples and the others as negative ones to establish machine learning models. It was observed that few features were sufficient for successful categorization of mircoRNA targets to species. For example mouse versus Caenorhabditis elegans reached up to 97% average accuracy over 100 fold cross validation. The simplicity of the approach, based on just k-mers, is promising for automatic categorization systems. In the future, this approach will help scrutinize alleged cross-kingdom regulation via miRNAs in respect to miRNA from one species targeting mRNAs in another.