Upgrade to Pro — share decks privately, control downloads, hide ads and more …

BirdCLEF2021まとめ

start
June 12, 2021

 BirdCLEF2021まとめ

start

June 12, 2021
Tweet

More Decks by start

Other Decks in Programming

Transcript

  1. ޿ౡͷҩֶੜɽ 
 ࠃࢼͷษڧͷ๣Βҩྍϕϯνϟʔ (ג)MNES ʹͯ ΠϯλʔϯΛ͓ͯ͠Γɼಉϕϯνϟʔ͕ޙԉ͢Δ LAIMEͱ͍͏ֶੜ޲͚ػցֶशษڧαʔΫϧͰ 
 ݚᮎΛੵΜͰ͍·͢ɽ 


    ຊίϯϖͰ༏উ͠ɼMasterͱͳΓ·ͨ͠ɽ 
 (ଟ෼ʹνʔϜϝΠτͷ͓͔͛Ͱ͕͢...) 
 ͋ͱɼྑ͍ΞΠίϯΛ୳͍ͯ͠·͢ɽ kaggleɿ@startjapan 
 twitter ɿ@startjapanml ࣗݾ঺հ
  2. ίϯϖ֓ཁ • 5ඵࠁΈͷԻ੠ηάϝϯτ͔Β໐͍͍ͯΔௗछΛಛఆ͢Δίϯϖ 
 (2020೥ʹ΋ಉ͡ओ࠵ऀ͕ྨࣅίϯϖΛ։࠵͍ͯ͠Δ → 2020೥ͷௗίϯϖͱݺͼ·͢) • trainσʔλ͸xeno-cantoͱ͍͏ௗͷ໐͖੠ڞ༗αΠτ͔Βऔಘ͞ΕͨԻ੠ 


    (train_short_audio) • testσʔλ͸10෼×80݅ͷԻ੠ϑΝΠϧɼ͜ΕΛ5ඵ͝ͱʹ۠੾Γ༧ଌ͢Δ 
 (test_soundscapes) • ্هͱ͸ผʹvalidation༻ͷԻ੠(10෼×20݅)΋༩͑ΒΕͨ 
 (train_soundscapes)
  3. [test_soundscapes] testσʔλɽ10෼×80͕݅ͩఏग़͠ͳ͍ͱΞΫηεͰ͖ͳ͍ɽ 
 τʔλϧ4ͭͷ৔ॴͰ࿥Ի͞Ε͍ͯΔɽ [train_short_audio] ֶशσʔλɽௗछ͝ͱʹԻ੠͕·ͱΊΒΕ͍ͯΔɽ 
 ߹ܭͰ62874݅ͷԻ੠σʔλɽ [train_soundscapes] test_soundscapesʹ͍ۙԻڹυϝΠϯΛ࣋ͭɽ

    
 10෼×20݅͋Γɼtest_soundscapesΛ࿥Իͨ͠4ͭͷ৔ॴͷ 
 ͏ͪ2ͭͷ৔ॴͰ࿥ΒΕͨԻ੠ [train_metadata.csv] train_short_audioʹର͢Δmetadataɽshape͸(62784, 14) [train_soundscape_labels.csv / test.csv] 10෼ͷϑΝΠϧΛ5ඵηάϝϯτʹ෼͚ͨࡍͷࠎ૊ΈΛఏڙɽ 
 train_soundscape_labels.csv͸train_soundscapesʹɼ 
 test.csv͸test_soundscapesʹରԠ͠ɼ 
 લऀʹͷΈਖ਼ղϥϕϧ͕෇͍͍ͯΔɽ
  4. 2020೥ͷௗίϯϖͱͷࠩҟ • train_soundscapesͷଘࡏ 
 ʔ train_short_audioͱtestσʔλͷؒʹ͸ԻڹυϝΠϯͷ͕ࠩେ͖͍ 
 ʔ ࠓճͷίϯϖͰ͸ΑΓtestσʔλʹ͍ۙԻڹυϝΠϯΛ࣋ͭtrain_soundscapes͕༩͑ΒΕͨ 


    ʔ validation༻్Ͱ༻͍ΒΕΔ͜ͱ͕ଟ͔͕ͬͨɼதʹ͸޻෉ֶͯ͠शʹ༻͍Δਓ΋͍ͨ • testσʔλͷҐஔ৘ใʹΞΫηεͰ͖ͨ 
 ʔ testσʔλͷ֤ϑΝΠϧ໊ʹ৔ॴͷ৘ใ͕ೖ͍ͬͯΔ͜ͱ͸อূ͞Ε͍ͯͨ (೔෇΋) 
 ʔ ैͬͯɼ͜ΕΒͷ৘ใ΋ԿΒ͔ͷܗͰύΠϓϥΠϯʹ૊ΈࠐΉඞཁ͕͋ͬͨ (ࢀߟɿStarter and some thoughts by @hidehisaarai1213)
  5. Ի੠ೝࣝλεΫͷϕʔγοΫͳղ๏ Ի੠σʔλ͸ԣ͕࣠࣌ؒɼॎ͕࣠प೾਺ɼ 
 ֤ϐΫηϧ͕৴߸੒෼ͷڧ౓Λࣔ͢ը૾ 
 (εϖΫτϩάϥϜ) ʹม׵ՄೳͰ͋Γɼ 
 ͜Εʹରͯ͠CNNͳͲΛదԠ͢Δͱ 


    ैདྷ௨Γͷը૾ॲཧͱͯ͠ѻ͑Δɽ ※ ຊίϯϖͰ͸ॎ࣠(प೾਺)ʹϝϧई౓Λ࢖༻ͨ͠ϝϧεϖΫτϩάϥϜ͕Α͘࢖ΘΕͨ 
 ※ ϝϧई౓ͱ͸ɿԻͷप೾਺ʹؔͯ͠ɼ͜ͷई౓্Ͱͷ͕ࠩಉ͡Ͱ͋Ε͹ਓ͕ؒࣖͰײ͡ΔԻͷߴ͞ͷࠩ΋ಉ͡ʹͳΔ CNN (ը૾͸BirdCLEF2021: Processing audio dataΑΓҾ༻)
  6. ຊίϯϖಛ༗ͷΫη • train_short_audioʹରͯ͠weak label͔͠ৼΒΕ͍ͯͳ͍ (weak label໰୊) 
 ʔ ਺ेඵͷԻ੠σʔλશମʹରͯ͠ϥϕϧ͕෇༩͞Ε͍ͯΔ 


    ʔ 5ඵ۠੾ΓͷηάϝϯτϨϕϧͰͲͷௗ͕໐͍͍ͯΔ͔͕෼͔Βͳ͍ • train_short_audioͷҰ෦Ͱϥϕϧͷܽଛ͕͋Δ (noisy label໰୊) 
 ʔ ಛʹsecondary_labels(※)ʹ͸ܽଛ͕͋Δͱ໌ه͞Ε͍ͯΔ • ࿥Ի೔΍৔ॴͷ৘ใͳͲͷmetadata΋ԿΒ͔ͷܗͰ৫ΓࠐΉඞཁ͕͋Δ (metadataͷ৫ࠐ) • ༧ଌର৅ͷલޙͷηάϝϯτͰௗ͕໐͍͍ͯΔ͔ͱ͍͏৘ใ΋ҙຯΛ࣋ͭՄೳੑ͕͋Δ 
 (ηάϝϯτલޙ৘ใͷ৫ࠐ) • train_soundscapesͱtest_soundscapesͰnocall཰͕େ͖͘ҟͳΔ (CVઓཱུ֬ͷࠔ೉) ※ train_short_audioͷϥϕϧʹ͸primary_labelͱsedondary_labelsͷ2छྨ͕͋Δ
  7. tl;dr 1st stage : ֎෦σʔλ(freefield1010)Λ࢖ͬͯbinary nocall detector ࡞੒ (1 :

    Կ͔ௗ໐͍ͯΔ / 0 : nocall) 
 2nd stage : nocall detectorΛ࢖ͬͯtrain_short_audio͔Βnocall෦෼ͷweightΛݮΒ্ͨ͠Ͱ397࣍ݩϚϧνϥϕϧ෼ྨثΛ࡞੒ 
 3rd stage : nocall detectorͷ݁Ռɼmetadataɼ2nd stageͷ݁ՌͳͲ͔Βࣗલtable competitionΛ࡞੒ ࠷ऴతʹࣗલtable competitionʹ͢Δ͜ͱͰ 
 weak label໰୊ɼnoisy label໰୊ɼmetadataͷ৫ࠐɼηάϝϯτલޙ৘ใͷ৫ࠐͳͲΛ·Δͬͱղܾʂʂ ※ Inference Part ͷΈͷུ֓Ͱ͋Γɼ1st stage෦෼͸লུ͍ͯ͠·͢
  8. ͳͥtableԽͰweak label໰୊ & noisy label໰୊͕ղܾ͞ΕΔʁ • 3rd stageͷtargetม਺ (0 :

    ͸ͣΕߦ / 1 : ͋ͨΓߦ) ͸ҎԼͷྲྀΕͰܾఆ͞ΕΔ • ਺ेඵͷԻ੠σʔλʹରͯ͠෇༩͞Εͨprimary & secondary labelsʹରͯ͠ηάϝϯτ୯ҐͰ༧ଌ஋Λग़ͤΔ 
 nocall detectorͱϚϧνϥϕϧ෼ྨثͷग़ྗΛ૊Έ߹ΘͤΔ͜ͱͰweak label໰୊Λղܾ • Ծʹsecondary labelsʹܽଛ͕͋Δͱϥϕϧ0͕෇༩͞ΕΔ͕ϥϕϧ0ͷαϯϓϧ਺͸ൺֱతଟ͘ 
 noise͸͍͍ײ͡ʹຒ΋ΕΔ (noisy label໰୊ͷ؇࿨)
  9. more details... • νʔϜϝΠτͷkami͞Μ (twitter : @634kami / kaggle :

    @kami634) ͕ҎԼʹ೔ຊޠͰղ๏Λ·ͱΊͯ͘Ε·ͨ͠ 
 Kaggle ͷௗίϯϖͰ1ҐΛऔͬͨ࿩ɿBirdCLEF 2021 ༏উղ๏
  10. 2nd place • train_short_audio͔Β30ඵ୯ҐͰநग़ͨ͠ͷͪɼ5ඵ͝ͱʹ۠੾Γɼmixup͢Δ 
 (weak label໰୊΁ͷରԠ) • train_soundscapesͷ͏ͪ10෼ؒશ͘ௗ͕໐͔ͳ͍Ի੠ϑΝΠϧ3ͭͷআ֎ &

    ϒʔτετϥοϓαϯϓϦϯά 
 (ϩόετͳCVઓུ) • label smoothing & metadataதͷratingྻΛ༻͍ͯॏΈ෇͚ (noisy label໰୊΁ͷରԠ) • ᮢ஋બ୒ͷࡍͷtips 
 ʔ LBͰ͸CVΑΓnocall཰͕௿͍ͷͰᮢ஋ΛԼ͛ͯௗΛଟ͘༧ଌ 
 ʔ ϞσϧؒͰ֬཰஋ͷ෼෍͕ҟͳΔͨΊ୯Ұͷ֬཰஋Λᮢ஋ͱ͢Δͷ͸φϯηϯε 
 ΑͬͯɼύʔηϯλΠϧϕʔεͷᮢ஋Λ࢖༻ • ͦͷଞ (ޙॲཧ) 
 ʔ ௗ͝ͱͷฏۉ༧ଌ֬཰͔Βݸʑͷ֬཰஋Λमਖ਼ 
 ʔ લޙηάϝϯτ৘ใΛ࢖༻ 
 ʔ nocall detectorͷ݁ՌΛՃຯ 
 ʔ ࣌ͱ৔ॴͷ৘ใ͔Β͋Γ͑ͳ͍ௗछΛ༧ଌΛ͍ͯ͠Δ৔߹͸࡟আ (metadataͷ৫ࠐ)
  11. 4th place • SEDϞσϧΛ࢖༻ɼೖྗ͸10-30ඵ (weak label໰୊΁ͷରԠ) 
 (ࢀߟɿIntroduction to Sound

    Event Detection by @hidehisaarai1213) • ͜ͷํ΋mixupΛ࢖༻ • psudo labelingΛ࣮ߦ (noisy label໰୊΁ͷରԠ) • ༧ଌର৅ͷ5ඵηάϝϯτͱͦΕΛத৺ͱ͢Δ30ඵηάϝϯτͷͦΕͧΕʹର͢ΔSEDͷग़ྗ Λ૊Έ߹Θͤͯ࠷ऴग़ྗͱͨ͠ (ηάϝϯτલޙ৘ใͷ৫ࠐ) 
 ʔ 5ඵηάϝϯτʹରͯ͠͸খ͞ͳᮢ஋ɼ30ඵηάϝϯτʹରͯ͠͸େ͖ͳᮢ஋Λ࢖༻ • 2Ґͷղ๏ͱಉ༷ʹɼ࣌ͱ৔ॴͷ৘ใ͔Β؍ଌ͞ΕΔՄೳੑ͕௿͍ͱ൑அͨ͠ௗछ͸࡟আ 
 (metadataͷ৫ࠐ)
  12. 5th place • 2020೥ͷௗίϯϖͰ2ҐͩͬͨํͰ͋Γɼࠓճ΋ͦΕΛϕʔεͱ͍ͯͨ͠ • લճ͔Βͷվળ఺ɿSEDʹมߋͰ +1% (※1) / ᮢ஋ௐ੔ʹΑΓ

    +1% / Ξϯαϯϒϧํ๏վྑͰ +1% 
 (+ ஍Ҭ৘ใΛ΋ͱʹ༧ଌϥϕϧͷߜΓࠐΈ΋ͨ͠Έ͍ͨ(※2) ) • augmentation͕ಛ௃తɿը૾Λ0.5-3৐ / nഒ଎ / Ӎ΍ձ࿩ͳͲͷԻΛ௥Ճ / ϊΠζ௥Ճ / 0.5ͷ֬཰Ͱप೾਺ௐ੔ 
 (1-4Ґ͸mixup΍ϊΠζ௥Ճʹཹ·Δҹ৅) • primary label͸ϥϕϧ1, secondary labels͸ϥϕϧ0.3Λ෇༩ • 1ͭͷηάϝϯτͰ؍ଌ͞Εͨௗ͸10෼ͷԻ੠ϑΝΠϧશମͰर্͍͛΍͘͢ͳΔΑ͏ௐ੔ (※3) ※1 : weak label໰୊΁ͷରԠ ※2 : metadataͷ৫ࠐ ※3 : ηάϝϯτલޙ৘ใͷ৫ࠐ
  13. 8th place • 2020೥ͷௗίϯϖͰ6Ґͩͬͨํɼࠓճ΋SEDΛ࢖༻ (weak label໰୊΁ͷରԠ) • ֶश࣌͸5ඵ or 20ඵηάϝϯτɼਪ࿦࣌͸40ඵηάϝϯτΛ࢖༻ɼ௕͍΄͏͕Α͔ͬͨ

    
 ·ͨɼਪ࿦͸0-40ඵͰߦͬͨ࣍ʹ20-60ඵͱ͍͏෩ʹoverlapΛ΋ͨͤͨ (ηάϝϯτલޙ৘ใͷ৫ࠐ) • augmentationɿΨ΢γΞϯϊΠζɼϐϯΫϊΠζɼϘϦϡʔϜௐ੔ɼϐονγϑτ 
 (mixup΋্ख͘ߦ͕ͬͨܭࢉࢿݯͷ໰୊Ͱ࠷ऴఏग़ʹ͸૊ΈࠐΊͳ͔ͬͨͦ͏) • ଛࣦؔ਺͕ಛ௃త (BCEFocal2WayLoss) • primary labelͱsecondary labels͸ಉ͡Α͏ʹѻͬͨ • psudo labelingΛ࣮ߦ (noisy label໰୊΁ͷରԠ) • ᮢ஋͸call thresholdͱnocall thresholdͷ2͕ͭଘࡏ͠ɼcall thresholdΛ௒͑ͨௗछ͸ཅੑͱ͢ΔҰํͰ 
 શͯͷௗछʹ͓͍ͯnocall thresholdΛ௒͑ͳ͔ͬͨηάϝϯτʹ͸nocall΋෇༩ (ௗϥϕϧͱnocall͕ڞଘ͠͏Δ) • ஍Ҭ৘ใ͔Βଘࡏ͢Δ͸͕ͣͳ͍ௗछ͸༧ଌ͍ͯͯ͠΋আ֎ (metadataͷ৫ࠐ) • ௗ͕໐͍͍ͯΔߦͱnocallߦʹ෼͚ͯF1είΞΛࢉग़͠0.54 * nocall_f1 + 0.46 * call_f1ͰCVΛಋग़ (ϩόετͳCVઓུ)
  14. 9th place • ֶश࣌ͷೖྗ͸5-7ඵηάϝϯτ • secondary labelsͷॏΈ͸খͨ͘͞͠ • mixup࢖༻ •

    ՄೳͳݶΓͷଟ༷ੑΛ΋ͨͤͨ 
 ʔ ࣌ؒ෼ղೳͷҟͳΔmel-spectrogramɼhop_length͸200ͱ320 
 ʔ ༷ʑͳbackbone 
 ʔ augmentationɿwhite noise, pink noise, band noise, nocall clipsɼmel-spectrogramը૾ͷྦྷ৐ • ޙॲཧ 
 ʔ 10෼ͷԻ੠σʔλશମʹ͓͚Δ֤ௗͷ໐͘࠷େ֬཰ or ฏۉ֬཰Ͱޙॲཧ (ηάϝϯτલޙ৘ใͷ৫ࠐ) 
 ʔ ஍Ҭ৘ใ͔ΒͲΕ͚֤ͩௗ͕໐͘Մೳੑ͕͋Δ͔ධՁͯͦ͠ͷ݁ՌͰޙॲཧ (metadataͷ৫ࠐ) 
 ʔ 1೔ͷؒͰ֤ௗ͕໐͘࠷େ֬཰Λ࢖ͬͯޙॲཧ (metadataͷ৫ࠐ) • Squeeze width of test soundscapes by 2-5% (mostly to reverse far field effects) • ͜ͷํ΋ᮢ஋Λ2ͭ(call, nocall)ઃఆ͠ɼௗϥϕϧͱnocallͷڞଘΛೝΊͨ
  15. 11th place • Public LBͰ௕͍͜ͱटҐΛಠ઎͞Ε͍ͯͨCPMP͞Μ • 2020೥ͷௗίϯϖͰ18ҐɼRainforestίϯϖͰ11ҐΛͱΒΕͨํͰ͋Γ྆ऀͷղ๏Λmixͨ͠΋ͷΛϕʔεͱͨͦ͠͏ 
 ʔ 2020೥ͷௗίϯϖͷղ๏

    : 18th place solution: efficientnet b3 
 ʔ Rainforestίϯϖͷղ๏ : 11th place, The 0.931 Magic Explained: Image Classification • 8Ґͷղ๏ͱಉ͘͡0.54 * nocall_f1 + 0.46 * call_f1ʹͯCVΛࢉग़ (ϩόετͳCVઓུ) • ΞϯαϯϒϧͰ࣮֬ʹείΞ্͕ঢ͢Δͱա৴͓ͯ͠Γίϯϖऴྃ਺೔લ·ͰΞϯαϯϒϧverΛఏग़͠ͳ͔ͬͨ͜ͱ Λޙչͳ͍ͬͯ͞Δ (࣮ࡍʹ͸Ξϯαϯϒϧ͕ޮ͔ͳ͔ͬͨͦ͏)