Biologically Interpretable Machine Learning Framework for Neuropeptide Classification

Conference proceedings article


Authors/Editors


Strategic Research Themes


Publication Details

Author listWarin Wattanapornprom; Patchnida Hemwannanukul; Kantinan Kuikaew; Supatcha Lertampaiporn; Apiradee Hongsthong; Chukiat Worasucheep

Publication year2025

Start page1

End page6

Number of pages6

URLhttps://ieeexplore.ieee.org/abstract/document/11320538

LanguagesEnglish-United States (EN-US)


View on publisher site


Abstract

Neuropeptides are essential signaling molecules that regulate diverse physiological processes, including pain perception, circadian rhythm, and endocrine communication. Accurate identification from protein sequences remains challenging due to short lengths, conserved motifs, and the need for interpretable models. We present a feature-driven machine learning framework for neuropeptide classification emphasizing interpretable protein descriptors grounded in biochemical properties. A curated dataset from NeuroPep and Swiss-Prot was encoded with 982 descriptors spanning composition, sequence-order, and physicochemical indices, then reduced to 150 features via multi-stage selection. CTDD11 (early hydrophobic occurrence) and CTDD31 (early polar occurrence) emerged as most predictive. Six classifiers (DT, NB, k-NN, SVM, RF, XGBoost) were evaluated; XGBoost achieved the highest accuracy (0.92), while k - NN achieved the highest ROC-AUC (0.945). Interpretable models highlighted biologically meaningful contributions, linking CTDD descriptors to secretion and cleavage motifs. The framework is transparent and effective for neuropeptide classification, with potential in peptide discovery and therapeutic design.


Keywords

computational systems biologyneuropeptide classificationPeptidesProtein Descriptors


Last updated on 2026-11-02 at 12:00