Human Disease


Small RNA


Paper Link:



MicroRNAs, short RNAs (∼20–25 nt) that perform their functions by guiding mRNA transcriptional degradation or translational suppression (Carthew and Sontheimer 2009, Wu, Tao et al. 2010), have various functions in organ development. For example, they mediate switching of chromatin remodeling complexes in neural development and participate in transcriptional circuits that control skeletal muscle gene expression and embryonic development (Chen, Mandel et al. 2006, Yoo, Staahl et al. 2009). Increasingly, evidence demonstrates that they can also function either as tumor suppressors or oncogenes (He, Thomson et al. 2005, Bonci, Coppola et al. 2008). Although more microRNA functions are being discovered, there are still many novel microRNAs whose functions remain to be elucidated.

To predict novel pre-microRNAs in specific animals and plants, comparative genomic-based methods have been developed, including MiRscan, MIRcheck, miRAlign and MIRFINDER (Lim, Lau et al. 2003, Laufs, Peaucelle et al. 2004, Wang, Wang et al. 2005, Huang, Fan et al. 2007). Although these tools are capable of identifying phylogenetically conserved stem–loop precursor RNAs, they do not work well when applied to genomes that lack close homologs. Recently, several machine learning-based algorithms have been introduced to predict microRNAs (Jiang, Wu et al. 2007, Xu, Zhou et al. 2008, Hsieh, Chang et al. 2010). In addition, some modified no-learning methods, based on simple and widely accepted principles, have been used, where pre-microRNAs are detected by manually choosing the optimal filter (Quail, Kozarewa et al. 2008). Although these methods have simple structures and flexibility, their performance can still be improved by combination with machine-learning methods.

In this study, we developed a novel machine-learning tool, named miRD (microRNA Detection) for accurate and efficient detection of novel pre-microRNAs. There are two sets of features and each was used to build a support vector machines (SVMs) model (Vapnik 2000). A boosting method was then applied to combine the two independent SVM models (Freund and Schapire 1996). We tested the performance of miRD on a small RNA deep-sequencing dataset of human fetal ovary. Altogether, 92 novel candidate pre-microRNAs were predicted by miRD and were sorted in descending order of the predicted probability (Supplementary Table S8). To confirm the expression of the predicted pre-microRNA, the top 16 candidates were selected for further experimental validation. Surprisingly, all these selected pre-microRNA from human fetal ovary were verified by real-time PCR (Supplementary Fig. S5). miRD was more efficient than any published algorithm (tripleSVM, MIReNA), with its AC and MCC reaching 94.0% and 0.872, respectively (Supplementary Table S6).

Cover of the Issue