Finding beans in burgers: paper reading notes

Finding beans in burgers Deep semantic-visual embedding with localization @lunardog
関東コンピュータービジョン勉強会　 2018.07.07

自己紹介 • レシェック • ポーランド人 • ２００５~ 機械学習の研究者 • ２０１０~
日本に来ました • ２０１６~ クックパッドに入社 • github: @lunardog twitter: @_lunardog_

CVPR 2018 SIGIR 2018 MsCOCO Recipe1M

CVPR 2017

Learning Cross-modal Embeddings for Cooking Recipes and Food Images •
CVPR 2017 • joint embedding of images and recipes

CVPR 2018

MsCOCO -> MsCOCO MsCOCO -> Flickr30K

triplet loss WELDON pooling

Triplet Loss

FaceNet: A Unified Embedding for Face Recognition and Clustering Florian
Schroff, Dmitry Kalenichenko, James Philbin

y z z’ 1-<y,z> 1-<y,z’> α

≥α ≥α ≥α ≥α ≥α ≥α

triplet loss WELDON pooling

1-<y,z> 1-<y,z’> α

≥α ≥α ≥α ≥α ≥α ≥α Instance Loss

≥α ≥α ≥α ≥α ≥α ≥α Semantic Loss

WELDON Pooling

Global Average Pooling Linear Typical Image Classifier

WELDON

0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
0.0 0.0 0.0 0.0 0.05 0.3 0.1 0.0 0.0 0.0 00 0.5 1.0 1.0 0.3 0.0 0.0 0.0 0.5 1.0 1.0 1.0 0.01 0.0 0.0 0.2 1.0 1.0 1.0 0.0 0.0 0.0 0.0 0.05 0.4 0.2 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.05 0.3 0.1 0.0 0.0 0.0 00 0.5 1.0 1.0 0.3 0.0 0.0 0.0 0.5 1.0 1.0 1.0 0.01 0.0 0.0 0.2 1.0 1.0 1.0 0.0 0.0 0.0 0.0 0.05 0.4 0.2 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.05 0.3 0.1 0.0 0.0 0.0 00 0.5 1.0 1.0 0.3 0.0 0.0 0.0 0.5 1.0 1.0 1.0 0.01 0.0 0.0 0.2 1.0 1.0 1.0 0.0 0.0 0.0 0.0 0.05 0.4 0.2 0.0 0.0 0.0 Global MAX Pooling Global Average Pooling

0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
0.0 0.0 0.0 0.0 0.05 0.3 0.1 0.0 0.0 0.0 00 0.5 1.0 1.0 0.3 0.0 0.0 0.0 0.5 1.0 1.0 1.0 0.01 0.0 0.0 0.2 1.0 1.0 1.0 0.0 0.0 0.0 0.0 0.05 0.4 0.2 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.05 0.3 0.1 0.0 0.0 0.0 00 0.5 1.0 1.0 0.3 0.0 0.0 0.0 0.5 1.0 1.0 1.0 0.01 0.0 0.0 0.2 1.0 1.0 1.0 0.0 0.0 0.0 0.0 0.05 0.4 0.2 0.0 0.0 0.0 min + max Pooling bottom m top k

https://tokyo-ml.github.io/hotdog-tf-js/ http://techlife.cookpad.com/entry/2018/04/06/124455

The END

Finding beans in burgers: paper reading notes

Finding beans in burgers: paper reading notes

Leszek Rybicki

More Decks by Leszek Rybicki

Other Decks in Research

Featured

Transcript

Finding beans in burgers Deep semantic-visual embedding with localization @lunardog

自己紹介 • レシェック • ポーランド人 • ２００５~ 機械学習の研究者 • ２０１０~

CVPR 2018 SIGIR 2018 MsCOCO Recipe1M

CVPR 2017

Learning Cross-modal Embeddings for Cooking Recipes and Food Images •

CVPR 2018

MsCOCO -> MsCOCO MsCOCO -> Flickr30K

triplet loss WELDON pooling

Triplet Loss

FaceNet: A Unified Embedding for Face Recognition and Clustering Florian

FaceNet: A Unified Embedding for Face Recognition and Clustering Florian

y z z’ 1-<y,z> 1-<y,z’> α

≥α ≥α ≥α ≥α ≥α ≥α

≥α ≥α ≥α ≥α ≥α ≥α

triplet loss WELDON pooling

1-<y,z> 1-<y,z’> α

≥α ≥α ≥α ≥α ≥α ≥α Instance Loss

≥α ≥α ≥α ≥α ≥α ≥α Semantic Loss

WELDON Pooling

Global Average Pooling Linear Typical Image Classifier

WELDON

0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0

0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0

https://tokyo-ml.github.io/hotdog-tf-js/ http://techlife.cookpad.com/entry/2018/04/06/124455

The END