Biologically Interpretable Machine Learning Framework for Neuropeptide Classification
Conference proceedings article
Authors/Editors
Strategic Research Themes
Publication Details
Author list: Warin Wattanapornprom; Patchnida Hemwannanukul; Kantinan Kuikaew; Supatcha Lertampaiporn; Apiradee Hongsthong; Chukiat Worasucheep
Publication year: 2025
Start page: 1
End page: 6
Number of pages: 6
URL: https://ieeexplore.ieee.org/abstract/document/11320538
Languages: English-United States (EN-US)
Abstract
Neuropeptides are essential signaling molecules that regulate diverse physiological processes, including pain perception, circadian rhythm, and endocrine communication. Accurate identification from protein sequences remains challenging due to short lengths, conserved motifs, and the need for interpretable models. We present a feature-driven machine learning framework for neuropeptide classification emphasizing interpretable protein descriptors grounded in biochemical properties. A curated dataset from NeuroPep and Swiss-Prot was encoded with 982 descriptors spanning composition, sequence-order, and physicochemical indices, then reduced to ∼150 features via multi-stage selection. CTDD11 (early hydrophobic occurrence) and CTDD31 (early polar occurrence) emerged as most predictive. Six classifiers (DT, NB, k-NN, SVM, RF, XGBoost) were evaluated; XGBoost achieved the highest accuracy (0.92), while k - NN achieved the highest ROC-AUC (0.945). Interpretable models highlighted biologically meaningful contributions, linking CTDD descriptors to secretion and cleavage motifs. The framework is transparent and effective for neuropeptide classification, with potential in peptide discovery and therapeutic design.
Keywords
computational systems biology, neuropeptide classification, Peptides, Protein Descriptors






