Upgrade to Pro — share decks privately, control downloads, hide ads and more …

computer-vision-survey

 computer-vision-survey

Computer Visionの近年の動向のサーベイ

KARAKURI Inc.

May 07, 2021
Tweet

More Decks by KARAKURI Inc.

Other Decks in Research

Transcript

  1. MobileNet v1-3 [Howard+ 2017, Sandler+ 2018, Howard+ 2019] 19 •

    ۭؒํ޲ͷΈͷ৞ΈࠐΉdepthwise convolutionͱ νϟωϧํ޲ͷΈ৞ΈࠐΉpointwise convolutionͰ৞ΈࠐΈͷܰྔԽ
  2. PNASNet [Liu+ 2017] 20 • Neural architecture search (NAS)ͷ݁ՌಘΒΕͨϞσϧ •

    CNNશମͰ͸ͳ͘ෳ਺ͷCNNϒϩοΫ͔ΒͳΔʮηϧʯΛ୳ࡧ • ୯७ͳ΋ͷ͔Βঃʑʹෳࡶͳ΋ͷ΁ͱ୳ࡧΛߦ͏
  3. Fast R-CNN [Girshick ICCV 2015] 32 • ·ͣը૾ͷಛ௃ϚοϓΛ࡞੒͠ɼީิྖҬ (ROI) Λಛ௃Ϛοϓ্ʹࣹӨ

    • ΦϒδΣΫτͷ෼ྨͱό΢ϯσΟϯάϘοΫεͷճؼ΋NNͰߦ͏ • ֤ީิྖҬ͝ͱͰ͸ͳ֤͘ը૾͝ͱʹ৞ΈࠐΊ͹Α͘ͳΓɼߴ଎Խ
  4. YOLO v1-4 [Redmon+ CVPR 2016, CVPR 2017, 2018, Bochkovskiy+ 2020]

    34 • ෺ମݕग़ͱ෺ମࣝผΛҰؾ௨؏ʹߦ͏one-stageͷख๏ • Ϋϥε֬཰ɼ֬৴౓ɼό΢ϯσΟϯάϘοΫεͷ৘ใΛग़ྗ
  5. RetinaNet [Lin+ ICCV 2017] 36 • ForegroundͱbackgroundͷΫϥεෆۉߧ͕one-stage๏͕ੑೳͰtwo- stage๏ʹྼΔཧ༝Ͱ͋Δ͜ͱΛࢦఠ • ΫϥεෆۉߧʹରԠ͢ΔͷͨΊͷFocal

    LossͷఏҊʹΑΓɼ1-stageͳ ͕Βߴ͍ਫ਼౓ͷ෺ମೝࣝΛ࣮ݱ • ϕʔεͷΞʔΩςΫνϟʔʹޙड़ͷFeature Pyramid NetworkΛ࢖༻
  6. Bridging the Gap Between Anchor-based and Anchor-free Detection [Zhang+ 2019]

    38 • Anchor-basedͱancho-freeͷҧ͍͸ɼෛྫͱਖ਼ྫͷબ୒ͷҧ͍
  7. DeepLab v1-3 [Chen+ TPAMI 2017] 45 • Down samplingΛͳ͘͠ɼdilated convolutionͱ૒ઢܗิؒΛ૊Έ߹Θ

    ͤΔ͜ͱͰߴղ૾౓ͳηάϝϯςʔγϣϯΛ࣮ݱ [Cui+ Remote Sens.2019]
  8. FastFCN [Wu+ 2019] 46 • Joint Pyramid Upsampling (JPU) ͷಋೖͰdilated

    convolutionʹൺ΂ͯ ܭࢉίετΛେ෯ʹ࡟ݮ
  9. ෼ྨ 68 [Chen+ 2020 Monocular Human Pose Estimation: A Survey

    of Deep Learning-based Methods] [Zheng+ 2020 Deep Learning-Based Human Pose Estimation: A Survey]
  10. ෼ྨ 73 [Ahmed+ 2020 A survey on Deep Learning Advances

    on Different 3D Data Representations]
  11. Monet [Monti+ CVPR 2017] 86 • ͜Ε·ͰͷඇϢʔΫϦουCNNͷҰൠԽ • ࠲ඪͷҰൠԽ •

    ݻఆͷΧʔωϧͰ͸ͳֶ͘शՄೳͳΧʔωϧΛ࢖͍ɼΧʔωϧͷҰൠԽ
  12. Neural 3D Mesh Renderer [Monti+ CVPR 2017] 90 • ߴਫ਼౓ͳϝογϡͷඍ෼ՄೳϨϯμϥʔ

    • ϥελϥΠζ෦෼Λඍ෼Մೳʹͨ͜͠ͱͰٯ఻೻Մೳʹ [https://www.slideshare.net/100001653434308/23d-neural-3d-mesh-renderer-cvpr-2018]
  13. ෼ྨ 93 [Han+ 2021 A Survey on Visual Transformer] [Khan+

    2021 Transformers in Vision: A Survey]
  14. ·ͱΊ 101 • Ϟσϧͷൃల͸ResNetΛϕʔεʹɼෳࡶԽɾେن໛Խɾޮ཰Խ • Vision transformer͕ଓʑొ৔ • جຊతͳcomputer visionͷλεΫʹಛԽͨ͠Ϟσϧ͸ϕϯνϚʔΫ͕

    ݻ·͍ͬͯΔ༷ࢠ • 2D → 3DͷྲྀΕ • Ϛϧνεέʔϧͳ৘ใͷ૊ΈࠐΈ͕Α͋͘Δҹ৅ • ࡉ͔͍ςΫχοΫ͕ॏཁͳҹ৅ [https://www.slideshare.net/cvpaperchallenge/cvpr-2020-237139930]
  15. ࢀߟࢿྉ 103 • [cvpaper.challenge-summary](https://github.com/hirokatsukataoka16/cvpaper.challenge-summary) • [CVPR 2016 ଎ใ](https://www.slideshare.net/HirokatsuKataoka/cvpr-2016) • [CVPR

    2017 ଎ใ](https://www.slideshare.net/cvpaperchallenge/cvpr-2017-78294211) • [CVPR 2018 ଎ใ](https://www.slideshare.net/cvpaperchallenge/cvpr-2018-102878612) • [CVPR 2019 ଎ใ](https://www.slideshare.net/cvpaperchallenge/cvpr-2019) • [CVPR 2020 ଎ใ](https://www.slideshare.net/cvpaperchallenge/cvpr-2020-237139930) • [ಈըೝࣝαʔϕΠv1ʢϝλαʔϕΠ ʣ](https://www.slideshare.net/cvpaperchallenge/v1-232973484) • [Vision and LanguageʢϝλαʔϕΠ ʣ](https://www.slideshare.net/cvpaperchallenge/vision-and-language-232926110) • [৞ΈࠐΈχϡʔϥϧωοτϫʔΫͷݚڀಈ޲](https://www.slideshare.net/ren4yu/ss-84282514) • [ConvNetͷྺ࢙ͱResNetѥछɺ΂ετϓϥΫςΟε](https://www.slideshare.net/ren4yu/convnetresnet) • [৞ΈࠐΈχϡʔϥϧωοτϫʔΫͷߴਫ਼౓Խͱߴ଎Խ](https://www.slideshare.net/ren4yu/ss-145689425) • [࿦จ঺հ: Fast R-CNN&Faster R-CNN](https://www.slideshare.net/takashiabe338/fast-rcnnfaster-rcnn) • [ʲ෺ମݕग़ʳSSD(Single Shot MultiBox Detector)ͷղઆ](https://www.acceluniverse.com/blog/developers/2020/02/SSD.html) • [ʲ෺ମݕग़ख๏ͷྺ࢙ : YOLOͷ঺հʳ](https://qiita.com/cv_carnavi/items/68dcda71e90321574a2b) • [ը૾ೝࣝͱਂ૚ֶश](https://www.slideshare.net/ren4yu/ss-234439652) • [semantic segmentation αʔϕΠ](https://www.slideshare.net/yoheiokawa/semantic-segmentation-141471958) • [Semantic segmentation ৼΓฦΓ](https://speakerdeck.com/motokimura/semantic-segmentation-zhen-rifan-ri) • [[DLྠಡձ]SlowFast Networks for Video Recognition](https://www.slideshare.net/DeepLearningJP2016/dlslowfast-networks-for-video-recognition-202057397) • [ࡾ࣍ݩ఺܈ΛऔΓѻ͏χϡʔϥϧωοτϫʔΫͷαʔϕΠ](https://www.slideshare.net/naoyachiba18/ss-120302579) • [ࡾ࣍ݩ఺܈ΛऔΓѻ͏χϡʔϥϧωοτϫʔΫͷαʔϕΠ Ver. 2](https://speakerdeck.com/nnchiba/point-cloud-deep-learning-survey-ver-2) • [఺܈ਂ૚ֶश Meta-study](https://www.slideshare.net/naoyachiba18/metastudy) • [ୈ̍ճ ࠷৽ͷML,CV,NLP ؔ࿈࿦จಡΈձ PointNet](https://www.slideshare.net/FujimotoKeisuke/point-net) • [ [DLྠಡձ]MeshͱDeep Learning Surface Networks & AtlasNet](https://www.slideshare.net/DeepLearningJP2016/dlmeshdeep-learning-surface-networks-atlasnet) • [࿦จ·ͱΊɿConvolutional Pose Machines](https://qiita.com/masataka46/items/88f1a375ce8a485d9454) • [ίϯϐϡʔλϏδϣϯͷ࠷৽࿦จௐࠪ 2D Human Pose Estimation ฤ](https://engineer.dena.com/posts/2019.11/cv-papers-19-2d-human-pose-estimation/) • [[ୈ2ճ3Dษڧձ ݚڀ঺հ] Neural 3D Mesh Renderer (CVPR 2018)](https://www.slideshare.net/100001653434308/23d-neural-3d-mesh-renderer-cvpr-2018) • [DeepLabʹ୅ΘΓݱࡏͷSOTAͰ͋ΔFastFCN(JPU)ͷ࿦จղઆ](https://qiita.com/kamata1729/items/1b495658a63d76904ac3)
  16. 104