GradCAM

Grad-CAM: Visual Explanations from Deep Networks via Gradient-based Localization

Contents ・Abstract ・Introduction ・Related Work ・Approach ・Evaluation ・Appendix

Abstract ・”Visual explanations” makes CNN more transparent ・Grad-CAM is applicable
to a wide variety of CNN model-families (1) CNNs with FC layer (e.g. VGG) (2) CNNs used for structured outputs (e.g. captioning) (3) CNNs used in tasks with multi-modal inputs (e.g. VQA) ・We do not have to change architecture of the model ・We do not have to re-train the model ・In the context of image classification models, Visualizations by Grad-CAM (a) show seemingly unreasonable predictions have reasonable explanations (b) help achieve model generalization by identifying datasets bias

Introduction ・ DNN lacks decomposability into intuitive and understandable ・
Transparency of models is important to build trust in intelligent systems. explain why they predict what they predict Transparency is useful at three different stages of AI evolution 1. When AI is significantly weaker than humans, the goal is to identify the failure modes, helping researchers 2. When AI is on par with humans, the goal is establish appropriate trust and confidence in users. 3. When AI is significantly stronger than humans (e.g. chess or Go), the goal of explanations is in machine teaching.

What makes a good visual explanation? (1) Class-discriminative (i.e. localize
the target category in the image) (2) High-resolution (i.e. capture fine-grained detail) (1) × (2) 〇 (1) 〇 (2) × (1) 〇 (2) 〇 (b) and (h) are similar (d) focus on stripes

Related work Class Activation Mapping (CAM) FC layer → Conv
layer + Global Average Pooling + Softmax ☹ Accuracy will decrease ☹ It only works for Image recognition ☹ Architecture have to be changed There typically exists a trade-off between accuracy and simplicity or interpretability

Approach Deeper representations in a CNN capture higher-level visual constructs.
Convolutional features naturally retain spatial information. lost in fully-connected layers Last Convolutional layers have high-level semantics and detailed spatial information The neurons in these layers look for sematic class-specific information To understand the importance of each neuron for decision of interest, Grad-CAM uses the gradient information flowing into the last convolutional layer.

Approach To understand the importance of each neuron for decision
of interest, Grad-CAM uses the gradient information flowing into the last convolutional layer. = 1 ෍ =1 ෍ =1 Grad−Cam = ෍ ∈ ℝ× : feature maps of the last conv. layer : channel index : class index : before softmax Gradients via backprop Global average pooling Importance of feature map for class c Coarse heat map We are only interested in the features that have a positive influence

Evaluating Localization The ImageNet localization challenge Images given Images label
given Bounding boxes estimate [51] requires a change in the model architecture, necessitates re-training and achieves worse classification error. Given an image ↓ Obtain class predictions from our network ↓ Generate Grad-CAM maps for each of the predicted classes ↓ Binarize with threshold of 15% of the max intensity ↓ Draw our bounding box around the single largest segment

Evaluating Visualization +2 +1 0 -1 -2 Guided Backpropagation 1.0
Guided Grad-CAM 1.27 n = 52

Analyzing Failure Modes for VGG-16

Effect of adversarial noise on VGG-16 = 1 ෍ =1
෍ =1 Grad−Cam = ෍

Identifying bias in dataset Model trained on biased datasets may
perpetuate biases and sterotypes ( gender, race, age) Nurse Doctor Biased dataset VGG16 This model achieves good accuracy on validation images. But at test time the model did not generalize as well (82%) 55 Women 195 Men 225 Women 25 Men

Identifying bias in dataset The model had learned to look
at the person’s Face, hairstyle to distinguish nurse from doctors. ☹ learning a gender stereotype The unbiased model made the right prediction looking at the white coat, and the stethoscope. Generalized Grad-CAM can help detect and remove biases in datasets Fair and ethical model

Supplementary material 1: Guided Grad-CAM on different layers

Supplementary material 2: Guided Grad-CAM on VQA

GradCAM

GradCAM

Kazu Ghalamkari

More Decks by Kazu Ghalamkari

Other Decks in Science

Featured

Transcript

Grad-CAM: Visual Explanations from Deep Networks via Gradient-based Localization

Contents ・Abstract ・Introduction ・Related Work ・Approach ・Evaluation ・Appendix

Abstract ・”Visual explanations” makes CNN more transparent ・Grad-CAM is applicable

Introduction ・ DNN lacks decomposability into intuitive and understandable ・

What makes a good visual explanation? (1) Class-discriminative (i.e. localize

Related work Class Activation Mapping (CAM) FC layer → Conv

Approach Deeper representations in a CNN capture higher-level visual constructs.

Approach To understand the importance of each neuron for decision

Evaluating Localization The ImageNet localization challenge Images given Images label

Evaluating Visualization +2 +1 0 -1 -2 Guided Backpropagation 1.0

Analyzing Failure Modes for VGG-16

Effect of adversarial noise on VGG-16 = 1 ෍ =1

Identifying bias in dataset Model trained on biased datasets may

Identifying bias in dataset The model had learned to look

Supplementary material 1: Guided Grad-CAM on different layers

Supplementary material 2: Guided Grad-CAM on VQA