Markovian encoding models in human splice site recognition using SVM


Pashaei E., AYDIN N.

Computational Biology and Chemistry, cilt.73, ss.159-170, 2018 (SCI-Expanded) identifier identifier identifier

  • Yayın Türü: Makale / Tam Makale
  • Cilt numarası: 73
  • Basım Tarihi: 2018
  • Doi Numarası: 10.1016/j.compbiolchem.2018.02.005
  • Dergi Adı: Computational Biology and Chemistry
  • Derginin Tarandığı İndeksler: Science Citation Index Expanded (SCI-EXPANDED), Scopus
  • Sayfa Sayıları: ss.159-170
  • Anahtar Kelimeler: Markovian model, Splice sites, Machine learning, DNA encoding method, MMSVM
  • İstanbul Gelişim Üniversitesi Adresli: Hayır

Özet

Splice site recognition is among the most significant and challenging tasks in bioinformatics due to its key role in gene annotation. Effective prediction of splice site requires nucleotide encoding methods that reveal the characteristics of DNA sequences to provide appropriate features to serve as input of machine learning classifiers. Markovian models are the most influential encoding methods that highly used for pattern recognition in biological data. However, a direct performance comparison of these methods in splice site domain has not been assessed yet. This study compares various Markovian encoding models for splice site prediction utilizing support vector machine, as the most outstanding learning method in the domain, and conducts a new precise evaluation of Markovian approaches that corrects this limitation. Moreover, a novel sequence encoding approach based on third order Markov model (MM3) is proposed. The experimental results show that the proposed method, namely MM3-SVM, performs significantly better than thirteen best known state-of-the-art algorithms, while tested on HS3D dataset considering several performance criteria. Further, it achieved higher prediction accuracy than several well-known tools like NNsplice, MEM, MM1, WMM, and GeneID, using an independent test set of 50 genes. We also developed MMSVM, a web tool to predict splice sites in any human sequence using the proposed approach. The MMSVM web server can be assessed at https://pashaei.shinyapps.io/mmsvm.