Interpretable Aesthetic Assessment and Prompt-Guided Image Retouching
Conference proceedings article
Authors/Editors
Strategic Research Themes
Publication Details
Author list: Viriyavisuthisakul, S.; Sanguansat, P.; Yamasaki, T.
Publisher: Institute of Electrical and Electronics Engineers Inc.
Publication year: 2025
Start page: 566
End page: 571
Number of pages: 6
ISBN: 9798350351422; 9798331594657
ISSN: 27704327
Languages: English-Great Britain (EN-GB)
Abstract
Image aesthetic evaluation is subjective and has traditionally relied on human judgment. Vision-Language models, such as CLIP, offer a novel approach by aligning visual features with textual prompts, enabling interpretable assessments. CLIP-based Image Quality Assessment (CLIP-IQA) methods utilize antonym prompt pairs to capture perceptual attributes in images, but the interpretability of these features in relation to human perception remains an open question. Recently, explainable aesthetic evaluation using vision-language models has been proposed. This approach explores the contribution of prompt-based features by combining CLIP-derived embeddings with Light Gradient Boosting Machine (LightGBM) for aesthetic scoring, supported by SHapley Additive exPlanations (SHAP) for feature attribution. A multimodal large language model (MLLM) is used to generate explanations based on these attributions. In this paper, we aim to extend this framework to be a practical application. SHAP values are used to guide prompt construction for image manipulation. Deblurring and super-resolution models are applied to improve the quality of image detail. A diffusion-based generative model synthesizes new images based on prompts, then reevaluates the impact of specific feature changes. This approach bridges interpretability and controllability in aesthetic assessment, offering new insights into the relationship between image attributes and perceived quality. © 2025 IEEE.
Keywords
No matching items found.






