Text-Guided Image Retouching Based on Interpretable Aesthetic Scoring

Poster

ผู้เขียน/บรรณาธิการ

สุพัตรา วิริยะวิสุทธิสกุล

กลุ่มสาขาการวิจัยเชิงกลยุทธ์

วิศวกรรมและวิทยาศาสตร์เชิงคำนวณ (การเปลี่ยนแปลงด้วยเทคโนโลยีดิจิตอล)

รายละเอียดสำหรับงานพิมพ์

รายชื่อผู้แต่ง: Supatta Viriyavisuthisakul, Parinya Sanguansat, Toshihiko Yamasaki

ปีที่เผยแพร่ (ค.ศ.): 2025

บทคัดย่อ

Image aesthetic evaluation is subjective and has traditionally depended on human judgment. Vision-language models, such as CLIP, present a novel paradigm by aligning visual features with textual prompts, thereby enabling interpretable assessments. CLIP-based Image Quality Assessment (CLIP-IQA) method employs antonym prompt pairs to capture perceptual attributes in images. However, whether these features align with human perception remains an open research question. Recent study has proposed explainable aesthetic evaluation frameworks by using vision-language models. It integrates CLIP-IQA with Light Gradient Boosting Machine (LightGBM) for aesthetic scoring, and applies SHapley Additive exPlanations (SHAP) to attribute importance to individual features. A multimodal large language model (MLLM) is then employed to generate natural language explanations based on these attributions. In this paper, we aim to extend this framework toward practical application. Specifically, we use SHAP values to guide prompt construction for targeted image manipulation. Image quality is enhanced through deblurring and super-resolution models. A diffusion-based generative model synthesizes new images based on the constructed prompts. These synthesized images are then re-evaluated to assess the effects of specific feature modifications. This approach bridges the gap between interpretability and controllability in aesthetic assessment, providing deeper insights into the relationship between image attributes and perceived quality.

คำสำคัญ

ไม่พบข้อมูลที่เกี่ยวข้อง