Notes from my reading of the CVPR 2018 paper: "Finding beans in burgers: Deep semantic-visual embedding with localization"