Synthetic Data for Scam Detection: Leveraging LLMs to Train Deep Learning Models

Conference proceedings article


Authors/Editors


Strategic Research Themes


Publication Details

Author listPitipat Gumphusiri, and Tuul Triyason

Publication year2024

Start page1

End page6

Number of pages6

URLhttps://www.wi-iat.com/wi-iat2024/index.html

LanguagesEnglish-United States (EN-US)


Abstract

This paper presents a novel approach to training scam detection models using synthetic data generated by Large Language Models (LLMs). We propose single-agent and multiagent methods for data generation and train six deep learning architectures—LSTM, BiLSTM, GRU, BiGRU, CNN, and BERT—to classify conversations as scam or non-scam. Our experiments demonstrate that models trained on synthetic data achieve high accuracy on both generated test sets and real-world scam conversations. The models perform well even with limited conversation turns and when analyzing only the suspect’s messages, indicating potential for early scam detection and privacypreserving applications. Our findings highlight the efficacy of synthetic data in overcoming real-world dataset limitations for scam detection. We make the dataset and trained models publicly available to facilitate further research and development in this critical area of fraud prevention.


Keywords

deep learningLarge language modelsscam detectionsynthetic data


Last updated on 2025-25-01 at 00:00