Biologically Interpretable Machine Learning Framework for Neuropeptide Classification

Conference proceedings article

ผู้เขียน/บรรณาธิการ

กลุ่มสาขาการวิจัยเชิงกลยุทธ์

รายละเอียดสำหรับงานพิมพ์

รายชื่อผู้แต่ง: Warin Wattanapornprom; Patchnida Hemwannanukul; Kantinan Kuikaew; Supatcha Lertampaiporn; Apiradee Hongsthong; Chukiat Worasucheep

ปีที่เผยแพร่ (ค.ศ.): 2025

หน้าแรก: 1

หน้าสุดท้าย: 6

จำนวนหน้า: 6

URL: https://ieeexplore.ieee.org/abstract/document/11320538

ภาษา: English-United States (EN-US)

ดูบนเว็บไซต์ของสำนักพิมพ์

บทคัดย่อ

Neuropeptides are essential signaling molecules that regulate diverse physiological processes, including pain perception, circadian rhythm, and endocrine communication. Accurate identification from protein sequences remains challenging due to short lengths, conserved motifs, and the need for interpretable models. We present a feature-driven machine learning framework for neuropeptide classification emphasizing interpretable protein descriptors grounded in biochemical properties. A curated dataset from NeuroPep and Swiss-Prot was encoded with 982 descriptors spanning composition, sequence-order, and physicochemical indices, then reduced to ∼150 features via multi-stage selection. CTDD11 (early hydrophobic occurrence) and CTDD31 (early polar occurrence) emerged as most predictive. Six classifiers (DT, NB, k-NN, SVM, RF, XGBoost) were evaluated; XGBoost achieved the highest accuracy (0.92), while k - NN achieved the highest ROC-AUC (0.945). Interpretable models highlighted biologically meaningful contributions, linking CTDD descriptors to secretion and cleavage motifs. The framework is transparent and effective for neuropeptide classification, with potential in peptide discovery and therapeutic design.

คำสำคัญ

computational systems biology, neuropeptide classification, Peptides, Protein Descriptors