Biologically Interpretable Machine Learning Framework for Neuropeptide Classification

Conference proceedings article

Authors/Editors

Strategic Research Themes

Publication Details

Author list: Warin Wattanapornprom; Patchnida Hemwannanukul; Kantinan Kuikaew; Supatcha Lertampaiporn; Apiradee Hongsthong; Chukiat Worasucheep

Publication year: 2025

Start page: 1

End page: 6

Number of pages: 6

URL: https://ieeexplore.ieee.org/abstract/document/11320538

Languages: English-United States (EN-US)

View on publisher site

Abstract

Neuropeptides are essential signaling molecules that regulate diverse physiological processes, including pain perception, circadian rhythm, and endocrine communication. Accurate identification from protein sequences remains challenging due to short lengths, conserved motifs, and the need for interpretable models. We present a feature-driven machine learning framework for neuropeptide classification emphasizing interpretable protein descriptors grounded in biochemical properties. A curated dataset from NeuroPep and Swiss-Prot was encoded with 982 descriptors spanning composition, sequence-order, and physicochemical indices, then reduced to ∼150 features via multi-stage selection. CTDD11 (early hydrophobic occurrence) and CTDD31 (early polar occurrence) emerged as most predictive. Six classifiers (DT, NB, k-NN, SVM, RF, XGBoost) were evaluated; XGBoost achieved the highest accuracy (0.92), while k - NN achieved the highest ROC-AUC (0.945). Interpretable models highlighted biologically meaningful contributions, linking CTDD descriptors to secretion and cleavage motifs. The framework is transparent and effective for neuropeptide classification, with potential in peptide discovery and therapeutic design.

Keywords

computational systems biology, neuropeptide classification, Peptides, Protein Descriptors