A Feature Engineering Approach to Improve Clustering-Based Persona Generation
Conference proceedings article
Authors/Editors
Strategic Research Themes
Publication Details
Author list: Wongabut, T., Ninrutsirikun, U., Nukoolkit C., Lavangnananda, P., Warasup K., and Arpnikanondt, C.
Publication year: 2025
Start page: 111
End page: 117
Number of pages: 7
Languages: English-United States (EN-US)
Abstract
Personas are widely recognized as essential tools in user-centered design and human-computer interaction, enabling designers to deeply understand target users’ behaviors, goals, and needs. With the increasing complexity and scale of digital systems,automated persona generation has emerged as a promising solution to streamline persona creation by leveraging user data,clustering algorithms, and large language models. Despite its potential, current methods face several limitations, including inadequate feature engineering, a lack of context-specific customization, and limited validation of persona relevance in real-world applications. This study aims to enhance the effectiveness of automated persona generation within the context of educational digital services by proposing a feature engineering-driven clustering approach. Using K-means clustering combined with dimension-based feature construction, we evaluate the clusters through silhouette analysis and assess the quality of personas based on cluster representativeness. The results demonstrate improved clustering cohesion and more representative persona profiles compared to baseline methods. The study contributes a structured methodology for generating data-driven personas tailored to educational environments, which benefits UX designers.
However, limitations include the reliance on survey-based datasets and the scope confined to higher education in Thailand. Future research will explore the generalizability of the proposed approach across different domains, conduct cross-cultural validation of persona models, further assess persona quality through expert evaluations, and investigate the use of alternative large language models to enhance the quality and relevance of generated personas.
Keywords
ChatGPT, Data clustering, Human-Computer Interaction, Large language models, User Persona