A Web Demo Interface for Explainable Image Aesthetic Evaluation Using Vision-Language Models
Conference proceedings article
ผู้เขียน/บรรณาธิการ
กลุ่มสาขาการวิจัยเชิงกลยุทธ์
รายละเอียดสำหรับงานพิมพ์
รายชื่อผู้แต่ง: Viriyavisuthisakul, S.; Yoshida, S.; Sanguansat, P.; Yamasaki, T.
ผู้เผยแพร่: Institute of Electrical and Electronics Engineers Inc.
ปีที่เผยแพร่ (ค.ศ.): 2025
หน้าแรก: 547
หน้าสุดท้าย: 550
จำนวนหน้า: 4
ISBN: 9798350351422; 9798331594657
นอก: 27704327
ภาษา: English-Great Britain (EN-GB)
บทคัดย่อ
Image aesthetic assessment (IAA) is a technique for evaluating the aesthetics of images. It is a challenging task because predicting the aesthetic quality is subjective. To enable automated IAA, the machine needs to understand and explain aesthetic-related composition. Recently, CLIP-IQA was proposed to evaluate image quality based on aesthetic antonym prompt pairs. Although the model achieves a high correlation with human aesthetic judgment, the reasons behind these scores remain unclear. In this study, we propose the integration of frameworks to deeply analyze the features that influence the aesthetic score. To predict the quality score, Light Gradient Boosting Machine (LightGBM) is applied as a regressor. SHapley Additive exPlanations (SHAP) scores are used to evaluate the contribution of each targeted prompt pair. For generating linguistic explanations, multiple large language models (MLLMs) are applied. The results show that the correlation coefficient increases. Our demo system can work with any input images, displaying the SHAP value along with text explanations based on the features users focus on. © 2025 IEEE.
คำสำคัญ
ไม่พบข้อมูลที่เกี่ยวข้อง






