|Title||Species categorization via MicroRNAs based on 3’UTR target sites using sequence features|
|Author(s)||Yousef, Malik; Levy, Dalit; Allmer, Jens|
|Source||In: Proceedings of the 11th International Joint Conference on Biomedical Engineering Systems and Technologies. - SciTePress - ISBN 9789897582806 - p. 112 - 118.|
|Event||11th International Joint Conference on Biomedical Engineering Systems and Technologies, Madeira, 2018-01-19/2018-01-21|
|Department(s)||PRI BIOS Applied Bioinformatics|
|Publication type||Contribution in proceedings|
|Keyword(s)||Categorization - Machine Learning - MicroRNA - MicroRNA Target - Sequence Features|
Proteins define phenotypes and their dysregulation leads to diseases. Post-translational regulation of protein abundance can be achieved by microRNAs (miRNAs). Therefore studying this method of gene regulation is of high importance. MicroRNAs interact with their target messenger RNA via hybridization within a specialized molecular framework. Many miRNAs and their targets have been identified and they are listed in various databases like miRTarBase. The experimental identification of functional miRNA-mRNA pairs is difficult and, therefore, they are detected computationally which is complicated due to missing negative data. Machine learning has been used for miRNA and target detection and many features have been described for miRNAs and miRNA:mRNA target duplexes generally on a per species basis. However, many claims of cross-kingdom regulation via miRNAs have been made and, therefore, we were interested whether it is possible to differentiate among species based on the target sequence in the mRNA alone. Thus, we investigated whether miRNA targets sites within the 3’UTR can be differentiated between species based on k-mer features only. Target information of one species was used as positive examples and the others as negative ones to establish machine learning models. It was observed that few features were sufficient for successful categorization of mircoRNA targets to species. For example mouse versus Caenorhabditis elegans reached up to 97% average accuracy over 100 fold cross validation. The simplicity of the approach, based on just k-mers, is promising for automatic categorization systems. In the future, this approach will help scrutinize alleged cross-kingdom regulation via miRNAs in respect to miRNA from one species targeting mRNAs in another.