A Web Demo Interface for Explainable Image Aesthetic Evaluation Using Vision-Language Models

Conference proceedings article


Authors/Editors


Strategic Research Themes


Publication Details

Author listViriyavisuthisakul, S.; Yoshida, S.; Sanguansat, P.; Yamasaki, T.

PublisherInstitute of Electrical and Electronics Engineers Inc.

Publication year2025

Start page547

End page550

Number of pages4

ISBN9798350351422; 9798331594657

ISSN27704327

URLhttps://www.scopus.com/inward/record.uri?eid=2-s2.0-105025045597&doi=10.1109%2FMIPR67560.2025.00092&partnerID=40&md5=9b92bc8f9ee0e636dbf1ae9f4ce1fe86

LanguagesEnglish-Great Britain (EN-GB)


View on publisher site


Abstract

Image aesthetic assessment (IAA) is a technique for evaluating the aesthetics of images. It is a challenging task because predicting the aesthetic quality is subjective. To enable automated IAA, the machine needs to understand and explain aesthetic-related composition. Recently, CLIP-IQA was proposed to evaluate image quality based on aesthetic antonym prompt pairs. Although the model achieves a high correlation with human aesthetic judgment, the reasons behind these scores remain unclear. In this study, we propose the integration of frameworks to deeply analyze the features that influence the aesthetic score. To predict the quality score, Light Gradient Boosting Machine (LightGBM) is applied as a regressor. SHapley Additive exPlanations (SHAP) scores are used to evaluate the contribution of each targeted prompt pair. For generating linguistic explanations, multiple large language models (MLLMs) are applied. The results show that the correlation coefficient increases. Our demo system can work with any input images, displaying the SHAP value along with text explanations based on the features users focus on. © 2025 IEEE.


Keywords

No matching items found.


Last updated on 2026-14-02 at 00:00