Upgrade to Pro — share decks privately, control downloads, hide ads and more …




Kazu Ghalamkari

May 20, 2020

More Decks by Kazu Ghalamkari

Other Decks in Science


  1. Abstract ・”Visual explanations” makes CNN more transparent ・Grad-CAM is applicable

    to a wide variety of CNN model-families (1) CNNs with FC layer (e.g. VGG) (2) CNNs used for structured outputs (e.g. captioning) (3) CNNs used in tasks with multi-modal inputs (e.g. VQA) ・We do not have to change architecture of the model ・We do not have to re-train the model ・In the context of image classification models, Visualizations by Grad-CAM (a) show seemingly unreasonable predictions have reasonable explanations (b) help achieve model generalization by identifying datasets bias
  2. Introduction ・ DNN lacks decomposability into intuitive and understandable ・

    Transparency of models is important to build trust in intelligent systems. explain why they predict what they predict Transparency is useful at three different stages of AI evolution 1. When AI is significantly weaker than humans, the goal is to identify the failure modes, helping researchers 2. When AI is on par with humans, the goal is establish appropriate trust and confidence in users. 3. When AI is significantly stronger than humans (e.g. chess or Go), the goal of explanations is in machine teaching.
  3. What makes a good visual explanation? (1) Class-discriminative (i.e. localize

    the target category in the image) (2) High-resolution (i.e. capture fine-grained detail) (1) × (2) 〇 (1) 〇 (2) × (1) 〇 (2) 〇 (b) and (h) are similar (d) focus on stripes
  4. Related work Class Activation Mapping (CAM) FC layer → Conv

    layer + Global Average Pooling + Softmax ☹ Accuracy will decrease ☹ It only works for Image recognition ☹ Architecture have to be changed There typically exists a trade-off between accuracy and simplicity or interpretability
  5. Approach Deeper representations in a CNN capture higher-level visual constructs.

    Convolutional features naturally retain spatial information. lost in fully-connected layers Last Convolutional layers have high-level semantics and detailed spatial information The neurons in these layers look for sematic class-specific information To understand the importance of each neuron for decision of interest, Grad-CAM uses the gradient information flowing into the last convolutional layer.
  6. Approach To understand the importance of each neuron for decision

    of interest, Grad-CAM uses the gradient information flowing into the last convolutional layer. = 1 ෍ =1 ෍ =1 Grad−Cam = ෍ ∈ ℝ× : feature maps of the last conv. layer : channel index : class index : before softmax Gradients via backprop Global average pooling Importance of feature map for class c Coarse heat map We are only interested in the features that have a positive influence
  7. Evaluating Localization The ImageNet localization challenge Images given Images label

    given Bounding boxes estimate [51] requires a change in the model architecture, necessitates re-training and achieves worse classification error. Given an image ↓ Obtain class predictions from our network ↓ Generate Grad-CAM maps for each of the predicted classes ↓ Binarize with threshold of 15% of the max intensity ↓ Draw our bounding box around the single largest segment
  8. Identifying bias in dataset Model trained on biased datasets may

    perpetuate biases and sterotypes ( gender, race, age) Nurse Doctor Biased dataset VGG16 This model achieves good accuracy on validation images. But at test time the model did not generalize as well (82%) 55 Women 195 Men 225 Women 25 Men
  9. Identifying bias in dataset The model had learned to look

    at the person’s Face, hairstyle to distinguish nurse from doctors. ☹ learning a gender stereotype The unbiased model made the right prediction looking at the white coat, and the stethoscope. Generalized Grad-CAM can help detect and remove biases in datasets Fair and ethical model