[ACCV22] Visual Explanation Generation Based on Lambda Attention Branch Networks

Tsumugi Iida1 Takumi Komatsu1 Kanta Kaneda1 Tsubasa Hirakawa2 Takayoshi Yamashita2
Hironobu Fujiyoshi2 Komei Sugiura1 1. Keio University 2. Chubu University Visual Explanation Generation Based on Lambda Attention Branch Networks

1 Visual explanations for deep neural networks are important in
terms of ・Enhancing accountability (e.g., health care) ・Providing scientific insight to experts (e.g., solar flare) 1 Magnetogram Visual explanation Introduction: Visual explanations can provide insights into unexplained phenomena

Problem Statement: Visual explanation generation for classification problem 2 Input:
• Image 𝒙 ∈ ℝ!×#×$ Outputs: • Predicted class • Visual explanation • Attention map: 𝜶 ∈ ℝ%×#×$ Visual Explanation IDRiD Important Regions Unimportant Regions

Related Works: Explanation generation for transformers has not been fully
established 3 Attention Branch Network [Fukui+, CVPR19] Generate explanation of CNN by branch structures Attention Rollout [Abnar+, 20] Explanation generation method chained with transformer attention Standard explanation generation method for transformers [Petsiuk+, BMCV18] Generic method for explanation generation (RiSE) Proposed a standard metric: Insertion-Deletion score (ID) Problem • Visual explanations for Lambda-based transformers has not been established • ID is inappropriate for images with sparse important regions Generic image Sparse image

4 Lambda Layer • Compatible with CNNs • Captures a
wide range of relationships with less computation than ViT ViT Lambda Related Works: Lambda Networks[Bello+, ICLR21]

4 Lambda Layer 画像特化したtransformer ViTより少ない計算量で全ピクセル間の関係を取得可能 Apply convolution to 𝒉
to generate query, key, value 𝑄 = Conv 𝒉 , 𝑉 = Conv(𝒉) 𝐾 = Softmax Conv 𝒉 Apply convolution to value to generate 𝝀! Compute the product of key and value to generate 𝝀" 𝝀! = Conv 𝑉 , 𝝀" = 𝐾#𝑉 Compute output 𝒉$ by the following equation: 𝒉$ = 𝝀! + 𝝀" # 𝑄 Related Works: Lambda Networks[Bello+, ICLR21]

4 Lambda Layer 画像特化したtransformer ViTより少ない計算量で全ピクセル間の関係を取得可能 𝝀! = Conv 𝑉
, 𝝀" = 𝐾#𝑉 𝒉$ = 𝝀! + 𝝀" # 𝑄 𝝀& : Compressed 𝑄 ◦Explanation generation strategy 1. Visualize 𝝀& 2. Introduce a new module to generate explanation Related Works: Lambda Networks[Bello+, ICLR21]

5 Proposed Method: Lambda Attention Branch Networks can generate visual
explanations for Lambda-based transformers

5 extracted features Proposed Method: Lambda Attention Branch Networks can
generate visual explanations for Lambda-based transformers

5 Introduces a branch structure to generate an attention map
𝜶 ∈ ℝ%×'×( Proposed Method: Lambda Attention Branch Networks can generate visual explanations for Lambda-based transformers

5 • Performs classification based on 𝜶 ⊙ 𝒉'() •
𝜶 contributes to both explanation and accuracy Proposed Method: Lambda Attention Branch Networks can generate visual explanations for Lambda-based transformers

6 Proposed Method: Introduced Saliency–guided training [Ismail+, NeurIPS21] to reduce
noise in the attention map 1. Generate mask image ) 𝒙 based on attention map Mask image ) 𝒙 Attention map Input 𝒙 2. Minimize KL-divergence between the output of the 𝒙 and ) 𝒙 ℒ*' = 𝐷*' 𝑓 𝒙 |𝑓 ) 𝒙

7 Insertion-Deletion score: IDs = AUC Insertion − AUC(Deletion) Problem
of Insertion-Deletion score (ID) • Out-of-distribution input • Prefer to generate coarse explanation Coarse attention map Deletion Input Fine-grained attention map Deletion Input Input Background of proposed metric: IDs is inappropriate for images with sparse important regions

1. Divide 𝒙 into 𝑚 × 𝑚 patches 𝒑)* 2.
Insert / Delete pixels according with the importance of attention map 3. Plot 𝑛 with predicted probability 4. Compute AUC 𝒙! = - 𝒑"# 𝑖, 𝑗 ∈ Top 𝑛 Importance (otherwise) Insertion Deletion 9 𝒃"# 𝑝(G 𝑦 = 1|𝒙! ) 𝑛 𝑛 Proposed Metric: Patch Insertion-Deletion score

Experimental Setting : Conducted experiments on two public datasets 10
Indian Diabetic Retinopathy Image Dataset (IDRiD) • Dataset for detecting diabetic retinopathy from retinal fundus images • Binary classification task DeFN Magnetogram Dataset • Dataset for solar flare prediction • Binary classification task IDRiD Num of samples Training 330 Validation 83 Test 103 DeFN Magnetograms Time Period Num of samples Training 2010-2015 45530 Validation 2016 7795 Test 2017 7790

IDRiD ID PID 𝑚 = 2 𝑚 = 4 𝑚
= 8 𝑚 = 16 RISE [Petsiuk+, BMVC18] 0.319 0.179 0.130 0.136 0.148 Lambda -0.101 -0.105 -0.116 -0.123 0.093 Ours 0.431 0.458 0.473 0.470 0.455 Quantitative Results: IDRiD Outperform baseline methods in IDs and PIDs 10 𝑚 : patch size IDRiD Visual Explanation

Quantitative Results: Magnetograms Outperform baseline methods in IDs and PIDs
10 𝑚 : patch size DeFN ID PID 𝑚 = 16 𝑚 = 32 𝑚 = 64 𝑚 = 128 RISE [Petsiuk+, BMVC18] 0.235 0.261 0.296 0.379 0.461 Lambda 0.374 0.414 0.403 0.378 0.291 Ours 0.506 0.748 0.755 0.757 0.756 Magnetogram Visual explanation

Ours Fine-grained / appropriate RISE Coarse / inappropriate Lambda Focus
on outside corners RISE Lambda Ours 11 Input Qualitative Results: IDRiD The proposed method generated fine-grained explanation

RISE Lambda Ours 11 Ours Fine-grained / appropriate RISE Coarse
/ inappropriate Lambda Focus on outside corners Input Qualitative Results: Magnetograms Generate Fine-grained explanation

• We proposed Lambda Attention Branch Network, which has a
parallel branching structure to obtain clear visual explanations • We also proposed the PID score, an effective evaluation metric for images with sparse important regions • LABN outperformed the baseline method in terms of the ID and PID scores 13 Conclusion

[ACCV22] Visual Explanation Generation Based on...

[ACCV22] Visual Explanation Generation Based on Lambda Attention Branch Networks

Semantic Machine Intelligence Lab., Keio Univ.
PRO

More Decks by Semantic Machine Intelligence Lab., Keio Univ.

Other Decks in Technology

Featured

Transcript

Tsumugi Iida1 Takumi Komatsu1 Kanta Kaneda1 Tsubasa Hirakawa2 Takayoshi Yamashita2

1 Visual explanations for deep neural networks are important in

1 Visual explanations for deep neural networks are important in

Problem Statement: Visual explanation generation for classification problem 2 Input:

Related Works: Explanation generation for transformers has not been fully

4 Lambda Layer • Compatible with CNNs • Captures a

4 Lambda Layer 画像特化したtransformer ViTより少ない計算量で全ピクセル間の関係を取得可能 Apply convolution to 𝒉

4 Lambda Layer 画像特化したtransformer ViTより少ない計算量で全ピクセル間の関係を取得可能 𝝀! = Conv 𝑉

5 Proposed Method: Lambda Attention Branch Networks can generate visual

5 extracted features Proposed Method: Lambda Attention Branch Networks can

5 Introduces a branch structure to generate an attention map

5 • Performs classification based on 𝜶 ⊙ 𝒉'() •

6 Proposed Method: Introduced Saliency–guided training [Ismail+, NeurIPS21] to reduce

7 Insertion-Deletion score: IDs = AUC Insertion − AUC(Deletion) Problem

1. Divide 𝒙 into 𝑚 × 𝑚 patches 𝒑)* 2.

Experimental Setting : Conducted experiments on two public datasets 10

IDRiD ID PID 𝑚 = 2 𝑚 = 4 𝑚

Quantitative Results: Magnetograms Outperform baseline methods in IDs and PIDs

Ours Fine-grained / appropriate RISE Coarse / inappropriate Lambda Focus

RISE Lambda Ours 11 Ours Fine-grained / appropriate RISE Coarse

RISE Lambda Ours 11 Ours Fine-grained / appropriate RISE Coarse

RISE Lambda Ours 11 Ours Fine-grained / appropriate RISE Coarse

RISE Lambda Ours 11 Ours Fine-grained / appropriate RISE Coarse

• We proposed Lambda Attention Branch Network, which has a