Upgrade to Pro — share decks privately, control downloads, hide ads and more …

AIが生成した画像の品質を測定する方法 / How to measure the qualit...

AIが生成した画像の品質を測定する方法 / How to measure the quality of AI-Generated Images

ここ数年、生成 AIはディープラーニング技術の発展を基盤に高品質のイメージとビデオを生成し、革新的なツールとして浮上しています。 GAN、VAE、Diffusionモデルのような新しいアーキテクチャは、エンターテインメント、広告、科学的シミュレーションなど多様な産業分野で創意的で実質的な応用を可能にしました。 しかし、生成モデルは正解のない結果物を生成するため評価が難しく、これを解決するために多様な評価尺度が研究されています。 このような評価尺度は、インペインティングとブラックボックス最適化による画像生成などで実質的な活用可能性を示しています。
この講義を通じて、生成されたイメージをどのように評価し、使用するか、多くのインスピレーションを得ることを期待します。

More Decks by LINEヤフーTech (LY Corporation Tech)

Other Decks in Technology

Transcript

  1. How to Measure the Quality of AI-Generated Images LINE Plus,

    Applied ML Dev. Jongwoo Han Heechan Kim Suman Bae
  2. Jongwoo Han I joined LINE Plus in 2022 and is

    currently the lead of Applied ML Dev. My current interests are evaluation and content monitoring using the vision models.
  3. - Introduction to Generated Image Evaluation - Proposed Image Generation

    Pipeline - Applications • Blackbox Optimization • Image Translation • Image Inpainting Agenda
  4. Given Ground Truth Ground Truth Prediction Results Angry Angry O

    Happy Angry X Traditional Vision Task Images Generation Task No Ground Truth Beyond Pixels: Challenges in Evaluating Generated Images Draw a man holding a tennis racket in a LINE- Style illustration.
  5. • Visual Quality / Aesthetics • Prompt Alignment • Originality

    • Photo Realism • Toxicity / Fairness • etc. Criterion for Generated Image Evaluation https://techblog.lycorp.co.jp/ko/how-to-evaluate-ai-generated-images-1
  6. IS (Inception Score) FID (Fréchet Inception Distance) Distribution Based Approach

    Instance Based Approach LAION aesthetics score CLIP-IQA Q-Align Visual Quality / Aesthetics Distribution of Trained Images Distribution of Generated Images Evaluation Model Training Using Aesthetics Datasets Generated Image Quality Evaluation Model Evaluation Result https://techblog.lycorp.co.jp/ko/how-to-evaluate-ai-generated-images-1
  7. CLIP Score Direct Approach QA(question-answering) Approach VQA(visual-question-answering) Score Prompt Alignment

    https://techblog.lycorp.co.jp/ko/how-to-evaluate-ai-generated-images-1 Generated Image Image Embedder Evaluation Result Text Prompt Text Embedder Prompt : "Does this figure show {Gen. Prompt}?" Generated Image QA Based Evaluation Evaluation Result
  8. Inference Phase Training Phase Proposed Image Generation Pipeline Image Generation

    Model Generated Image Auto Evaluation Loss Image Generation Model Generated Image Auto Evaluation Satisfied? No Output Yes
  9. How to apply the generated image evaluation to improve quality

    Practical Cases Applications • Blackbox Optimization • Image Translation • Image Inpainting
  10. Heechan Kim I joined LINE Plus in 2021. Within the

    team, I am developing various AI models and services according to the diverse needs within the company. Currently, I am very interested in generative AI models and their evaluation. Applications • Blackbox Optimization • Image Translation • Image Inpainting
  11. #2 Setting (Hyperparameters) A woman is holding a picket sign.

    The sign has the words "hold the LINE" written on it. How Challenging is Image Gen.? Same Model, Same Prompt but Different Results with Different Hyperparameters Prompt #1 Setting (Hyperparameters) Finetuned StableDiffusion3.5 Model
  12. Find Good Starting Point ⎯ Attend-and-Excite ⎯ InitNO Measure relevance

    using attention map between noise and prompt Mining seeds using initial noise and prompt A cat and a rabbit
  13. CLIPScore HPS v2 PickScore VQAScore Measure Goodness of Image Average

    out various scores Selected Scorer 0.2395 0.2243 #1 Hyperparameters #2 Hyperparameters
  14. Apply Blackbox Optimization Find good hyperparameters based on the result

    Minimize efforts for selecting hyperparameters Yes Image Generation Model Generated Image Auto Evaluation Blackbox Opt. Hyperparams End? Summary No Seed, Prompt
  15. #1 Hyperparameters #2 Hyperparameters #3 Hyperparameters Final Results with One

    Click We can select the best image in these candidates
  16. Original Image Can We Measure Quality of Anything? How good

    is translated result? Translated Image
  17. Mining Phase Evaluation Phase Rubric Mining and Eval. with VLM

    Find good criteria, Evaluate image with the criteria VLM Seed rubric Image Refined rubric VLM Image Score "criteria": "Visual Consistency", "description": "Determine how well the visual style and layout are preserved during translation, maintaining a consistent appearance with the original.", "scoring": { "5": "Visual style and layout are perfectly… "criteria 1": { "description": "The translated image should be similar to the reference image in terms of text block position.", "score": { "0": "The translated image is not similar to the reference image in terms of text block position.", … "Visual Consistency: 5” … Prometheus-Vision
  18. Refined Rubrics and Evaluation Results VLM can evaluate the image

    using the rubrics, but these scores are acceptable? Text Chunking and Blocking Cultural Context and Appropriateness Text Positioning and Layout Consistency Numerical and Symbol Accuracy Visual Clarity and Quality 4 4 5 4 5
  19. Image Gen. w. B.O. Not sensitive score about specific style

    Find a measure which can capture characteristics of the style Evaluation with VLM VLM ignores the features of detailed images Explore methods to inject detailed image feature into VLM Limitation and Future Plan We can go further
  20. Suman Bae I joined LINE Plus in 2021 and developed

    various ML models related to Vision. Currently, I am very interested in applications using multimodal and generative models.
  21. How to apply the generated image evaluation to improve quality

    Image Inpainting Applications • Blackbox Optimization • Image Translation • Image Inpainting
  22. Unexpected people in the photo Background Person Removal Original Photo

    Application of Image Inpainting Clear background Photo with background people removed
  23. Pipeline Instance Segmentation Original Img Inpainting Area Selection Inpainting Output

    Img Salient Object Detection Instance Segmentation Salient Object Detection Inpainting Area Selection Background Person Removal
  24. Simple Case LaMa Model (WACV, 2022) HINT Model (TMM, 2024)

    Flux. 1-Fill-dev Model (Arxiv, 2024) Original Image
  25. Challenging Case LaMa Model (WACV, 2022) HINT Model (TMM, 2024)

    Flux. 1-Fill-dev Model (Arxiv, 2024) Original Image
  26. Single-Image based Image Quality Assessment - LAION aesthetics score-v2 -

    CLIP-IQA (AAAI, 2023) - Q-Align (ICML, 2024) Distribution-Based Image Quality Assessment - FID (NeurIPS, 2017) - FD-Dino (NeurIPS, 2023) - CMMD (CVPR, 2023) Image Quality Assessment with Promt Alignment - ImageReward (NeurIPS, 2023) - HPS v2 (Arxiv, 2023) - PickScore (NeurIPS, 2024) Evaluation Metrics https://techblog.lycorp.co.jp/ko/how-to-evaluate-ai-generated-images-3-inpainting
  27. Evaluation Model - LaMa (WACV 2022) - MAT (CVPR 2022)

    - CoordFILL (AAAI 2023) - SCAT (AAAI 2023) - HINT (TMM 2024) - MxT (BMVC 2024) - PUT (TPAMI 2024) - Latent Codes for Pluralistic Image Inpainting (CVPR 2024) - FLUX. 1-Fill-dev (Arxiv 2024) Evaluation Dataset - Places365-Standard validation dataset: 36,500 images - Challenging case of background person removal dataset: 10 images Evaluation Model and Datasets https://techblog.lycorp.co.jp/ko/how-to-evaluate-ai-generated-images-3-inpainting
  28. - CMMD: 0.898 - LAION aesthetics score-v2: 0.877 - FID:

    0.877 - Q-Align: 0.843 - PickScore: 0.648 - FD-Dino: 0.604 - ImageReward: 0.428 - HPS v2: 0.387 - CLIP-IQA: 0.063 Evaluation Result Places365-Standard validation dataset Pearson Correlation Coefficient with Human Evaluation
  29. - LAION aesthetics score-v2: 0.924 - Q-Align: 0.384 - HPS

    v2: -0.290 - PickScore: 0.282 - ImageReward: 0.279 - CLIP-IQA: 0.187 Evaluation Result Challenging case of background person removal dataset Pearson Correlation Coefficient with Human Evaluation
  30. Summary • Since various results can be considered correct for

    image generation models, extensive studies are actively being conducted to best mimic the image quality as perceived by humans. • Our team is researching these image generation evaluation methods and applying them to various applications. • Since image generation models are expected to be used in various fields in the future, the importance of this evaluation area is expected to grow increasingly.
  31. • Tech Blog • https://techblog.lycorp.co.jp/en/how-to-evaluate-ai-generated-images-1 • EN Released, JP Release

    : 7/16 • https://techblog.lycorp.co.jp/en/how-to-evaluate-ai-generated-images-2-blackbox- optimization • EN Released, JP Release : 7/28 • https://techblog.lycorp.co.jp/en/how-to-evaluate-ai-generated-images-3-inpainting • EN Released, JP Release : 8/8 Reference