Upgrade to Pro — share decks privately, control downloads, hide ads and more …

機械学習モデルの局所的な解釈に着目したシステムにおける異常の原因診断手法の構想

tsurubee
June 04, 2021

 機械学習モデルの局所的な解釈に着目したシステムにおける異常の原因診断手法の構想

tsurubee

June 04, 2021
Tweet

More Decks by tsurubee

Other Decks in Research

Transcript

  1. ͘͞ΒΠϯλʔωοτגࣜձࣾ (C) Copyright 1996-2019 SAKURA Internet Inc ͘͞ΒΠϯλʔωοτݚڀॴ ػցֶशϞσϧͷہॴతͳղऍʹண໨ͨ͠ 


    γεςϜʹ͓͚ΔҟৗͷݪҼ਍அख๏ͷߏ૝ 2019/12/06 ୈ8ճWebSystemArchitectureݚڀձ ɹ௽ా തจɼ௶಺ ༎थ 
 ͘͞ΒΠϯλʔωοτגࣜձࣾ ͘͞ΒΠϯλʔωοτݚڀॴ
  2. 9 ػցֶशͷղऍੑ • ਂ૚ֶशΛ͸͡Ίͱ͢ΔػցֶशϞσϧ͸ɼ༷ʑͳ෼໺΁Ԡ༻͕ਐΜͰ͍Δɽ • ҰํɼػցֶशϞσϧ͸ͦͷܭࢉաఔ͕ෳࡶͰ͋ΔͨΊɼਓ͕ؒಈ࡞Λཧղ͢Δ͜ͱ͕Ͱ͖ ͳ͍ϒϥοΫϘοΫεͱͳΔ͜ͱ͕໰୊ࢹ͞Ε͓ͯΓɼػցֶशϞσϧͷղऍੑʹ͍ͭͯͷ ݚڀ͕஫໨͞Ε͍ͯΔ※4ɽ • AIར׆༻ݪଇҊ

    (૯຿লɼ2018೥) • ಁ໌ੑͷݪଇɼΞΧ΢ϯλϏϦςΟ(આ໌੹೚) 
 ͷݪଇ • DARPA (ถࠃ๷ߴ౳ݚڀܭըہ) • Explainable Arti fi cial Intelligence (XAI) 
 ϓϩδΣΫτ ※4 A. Adadi and M. Berrada, Peeking Inside the Black-Box: A Survey on Explainable Arti fi cial Intelligence (XAI), IEEE Access, 2018. ػցֶशͷղऍੑɾઆ໌ੑʹؔ͢Δ࿦จ਺ͷਪҠ※4
  3. 10 ہॴతͳղऍ • ہॴతͳղऍͱ͸ɼಛఆͷೖྗʹର͢ΔϞσϧͷ༧ଌ΍൑அͷࠜڌΛղऍ͢Δ͜ͱͰ͋Δɽ • ୅දతͳख๏ͱͯ͠LIME※5΍SHAP※6͕ڍ͛ΒΕΔɽ • ͜ΕΒͷख๏͸ɼ༧ଌ΍൑அͷࠜڌͱͳͬͨಛ௃ྔΛఏࣔ͢Δख๏Ͱ͋Δɽ • ྫ͑͹ɼը૾෼ྨͷػցֶशϞσϧʹରͯ͋͠Δը૾Λ

    
 ༩͑Δͱɼͦͷը૾Λʮmeerkatʯͱ൑அͨ͠ͱ͢Δɽ 
 LIME΍SHAPͰ͸ͦͷࠜڌͱͳΔಛ௃ྔʢը૾ͷ৔߹͸ 
 ϐΫηϧʹ૬౰ʣΛ൑அ΁ͷد༩ͷ౓߹͍ͱͱ΋ʹఏࣔ 
 ͢Δɽ ※5 M. T. Ribeiro et al., "Why Should I Trust You?": Explaining the Predictions of Any Classi fi er, Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining(KDD’16), 2016. 
 ※6 S. Lundberg and S. I. Lee, A Uni fi ed Approach to Interpreting Model Predictions, Advances in Neural Information Processing Systems 30(NIPS 2017), 2017. https://github.com/slundberg/shap
  4. 11 ہॴతͳղऍͱҟৗͷݪҼ਍அ • ہॴతͳղऍख๏͸ɼಛʹը૾ೝࣝͷ෼໺Ͱ਺ଟ͘ͷݚڀ͕ใࠂ͞Ε͍ͯΔ͕ɼҟৗͷݪҼ ਍அʹ͓͍ͯ΋ͦͷ༗༻ੑ͕ࣔ͞Ε͍ͯΔ※7-9ɽ • ྫ͑͹ɼSHAPͳͲΛ༻͍ͯPCA※7΍ΦʔτΤϯίʔμ※8ɼࠞ߹Ψ΢εϞσϧ※9ɼม෼Φʔτ Τϯίʔμ※9ͳͲʹΑΔҟৗݕ஌ͷ݁Ռͷղऍ͕ɼଞͷख๏ͱൺֱͯ͠ɼݪҼͷಛఆਫ਼౓͕ ߴ͍ɼ΋͘͠͸ਓؒͷ௚ײʹ͍ۙղऍΛ༩͑ΔͳͲͷݚڀ͕ใࠂ͕͞Ε͍ͯΔɽ ※7

    N. Takeishi, Shapley Values of Reconstruction Errors of PCA for Explaining Anomaly Detection, IEEE International Conference on Data Mining Workshops (ICDM Workshops), 2019. 
 ※8 L. Antwarg et al., Explaining Anomalies Detected by Autoencoders Using SHAP, arXiv:1903.02407, 2019. 
 ※9 N. Takeishi and Y. Kawahara, On Anomaly Interpretation via Shapley Values, arXiv:2004.04464, 2020.
  5. 14 Step 1ɿϝτϦοΫͷϑΟϧλϦϯά • ఏҊख๏͸ɼࣄલʹ෼ੳର৅ͱͳΔϝτϦοΫΛࢦఆ͢Δඞཁ͕ͳ͍ͨΊɼҟৗൃੜޙʹର৅ϝτ ϦοΫΛબఆͰ͖Δɽ • ҟৗൃੜ࣌ʹ΄ͱΜͲมಈ͕ͳ͍ϝτϦοΫͳͲɼͦͷҟৗ΁ͷؔ࿈ͷՄೳੑ͕௿͍΋ͷΛϑΟϧ λϦϯά͢Δ͜ͱ͸ɼݪҼ਍அͷਫ਼౓ͷ޲্ͱޙଓεςοϓͷ࣮ߦ࣌ؒͷ୹ॖʹ༗ޮͰ͋Δɽ •

    ҟৗ΁ͷؔ࿈ੑ͕௿͍ϝτϦοΫΛϑΟϧλϦϯά͢Δख๏ͷҰͭͱͯ͠ɼҎલͷզʑͷݚڀ੒Ռ Ͱ͋ΔTSifter※10ͷ׆༻Λݕ౼͢Δɽ ※10 ௶಺ ༎थ, ௽ా തจ, ݹ઒ խେ, TSifter: ϚΠΫϩαʔϏεʹ͓͚Δੑೳҟৗͷ ਝ଎ͳ਍அʹ޲͍ͨ࣌ܥྻσʔλͷ࣍ݩ࡟ݮख๏, ୈ13ճΠϯλʔωοτͱӡ༻ٕज़γϯϙδ΢Ϝ(IOTS 2020). TSifterͷ֓ཁਤ※4 ఆৗੑͷݕఆ ֊૚తΫϥελϦϯά
  6. 15 Step 2ɿϞσϧͷֶश • ఏҊख๏Ͱ͸ɼҟৗൃੜޙʹ؍ଌσʔλ͔ΒϞσϧΛֶश͢ΔͨΊɼߴ଎ʹֶशՄೳͳϞσϧ 
 Λ༻͍Δඞཁ͕͋Δɽ • ҟৗݕ஌ͷϞσϧͱͯ͠ɼओ੒෼෼ੳ (PCA)ͷར༻Λݕ౼͢Δ

    (ࠓޙɼඇઢܗͷϞσϧ౳ʹ֦ு 
 ༧ఆ)ɽ • PCAΛ༻͍ͨҟৗݕ஌Ͱ͸ɼ؍ଌσʔλʹର͢Δ࣍ݩ࡟ݮʹΑΓਖ਼ৗ෦෼ۭؒΛٻΊɼςετ σʔλͱਖ਼ৗ෦෼ۭؒͱͷڑ཭ΛҟৗείΞͱ͢Δɽ • ఏҊख๏Ͱ͸ɼPCAͰࢉग़͞ΕΔҟৗείΞΛҟৗݕ஌Ͱ͸ͳ͘ɼݕ஌ޙͷݪҼ਍அʹ༻͍Δɽ ਖ਼ৗ෦෼ۭؒ ςετσʔλ 
 (ϕΫτϧ) ಛ௃ۭؒ
  7. 16 Step 3ɿҟৗ΁ͷߩݙ౓ͷܭࢉ • ఏҊख๏Ͱ͸ɼҟৗͷݪҼ਍அΛߦ͏ͨΊʹɼ֤ϝτϦοΫͷҟৗ΁ͷߩݙ౓Λܭࢉ͢Δɽ • ߩݙ౓ͷܭࢉʹ͸ɼڠྗήʔϜཧ࿦ͷShapley Valueʹجͮ͘SHAPͷར༻Λݕ౼͢Δɽ • SHAPͷΞϧΰϦζϜͷதͰ΋ɼKernel

    SHAPΛ࠾༻͢Δɽ • Model-agnostic (Ϟσϧඇґଘ) ͳղऍख๏ • Linear LIMEͱShapley ValueΛ૊Έ߹ΘͤͨΞϓϩʔν Kernel SHAP ɿղऍ͍ͨ͠ෳࡶͳϞσϧ 
 ɿઆ໌༻ͷ୯७ͳϞσϧ 
 ɿ༧ଌ஋ʹର͢Δ֤ಛ௃ྔͷߩݙ౓ Additive feature attribution methods※6 ※6 S. Lundberg and S. I. Lee, A Uni fi ed Approach to Interpreting Model Predictions, Advances in Neural Information Processing Systems 30(NIPS 2017), 2017. f g ϕ
  8. ࣮ݧ؀ڥ • Google Kubernetes Engine (GKE)্ʹϚΠΫϩαʔ ビ εͷ ベ ϯνϚʔΫΞ

    プ Ϧέʔγϣϯ で ͋ΔSock Shop※Λߏஙͨ͠ɽ • Sock ShopΛߏ੒͢Δ11ίϯςφ͔ΒcAdvisorΛ༻͍ͯCPU࢖༻཰ͳͲͷϝτϦοΫΛ5ඵ͓ ͖ʹऩूͨ͠ɽ • Sock ShopΞϓϦέʔγϣϯʹରͯ͠ɼٖࣅతͳෛՙΛੜ੒͢ΔͨΊʹɼLocustΛར༻ͨ͠ɽ • γεςϜͷҟৗΛ໛฿͢ΔͨΊʹɼuser-dbίϯςφʹCPUෛՙΛ஫ೖͨ͠ɽ 18 ※ https://microservices-demo.github.io/ Fron-end Catalogue Orders Carts User Payment Shipping Sock Shop Locust Prometheus ϚΠΫϩαʔϏεΫϥελ ੍ޚαʔό ֎෦ෛՙͷੜ੒ CPUෛՙ஫ೖ ϝτϦοΫͷ 
 ऩूɾอଘ stress-ng ղੳαʔό ϝτϦοΫ 
 औಘϞδϡʔϧ ղੳϞδϡʔϧ 8core, 32GB
  9. 1. ҟৗͷݪҼ਍அɿҟৗͷߩݙ౓ 21 1λΠϜεςοϓʹ͓͚Δҟৗ΁ͷߩݙ౓ (SHAPͷforce plot) ςετσʔλશମ(120λΠϜεςοϓ)ʹ͓͚Δ ҟৗ΁ͷߩݙ౓ (SHAPͷsummary plot)

    ※ c-(ίϯςφ໊)_(ϝτϦοΫ໊) • ࠨਤͷ݁Ռ͸ɼࢉग़ͨ͠SHAP஋ͷઈର஋ 
 ͷฏۉ͕େ͖͍΋ͷ͔Βॱʹ্͔Βฒ΂ͯ ͓ΓɼݪҼϝτϦοΫͷީิΛ্͔Βฒ΂ ͍ͯΔ͜ͱʹ૬౰͢Δɽ • ຊ࣮ݧ৚݅ʹ͓͍ͯɼSHAPʹΑΔղऍ͸ɼ ࣮ࡍͷҟৗͷࠜຊݪҼͱҰகͨ݁͠ՌΛ༩ ͍͑ͯΔɽ
  10. 1. ҟৗͷݪҼ਍அɿϕʔεϥΠϯͱͷൺֱ 22 • ݪҼ਍அͷϕʔεϥΠϯख๏ͱͯ͠ɼGaussian Based ThresholdingʢGBTʣΛ༻͍ͨɽ • GBTΛ༻͍ͨݪҼ਍அ͸ɼֶशσʔλͷฏۉ஋ͱςετσʔλͷฏۉ஋ͷࠩ෼͕େ͖͍ॱ൪ʹ ҟৗ΁ͷߩݙ౓͕ߴ͍ͱ͢Δɽ

    GBTʹΑΔҟৗ΁ͷߩݙ౓ • ຊ࣮ݧʹ͓͚ΔࠜຊݪҼͰ͋Δuser-dbͷCPU ͷϝτϦοΫ͸ҟৗ΁ͷߩݙ౓͕7൪໨ͱͳͬ ͍ͯͨɽ • ͜ͷ݁Ռ͸ɼਖ਼ৗ࣌Ͱ΋෼ࢄ͕େ͖͍ϝτ ϦοΫ͕ɼۮൃతʹֶशσʔλͱςετσʔλ ͷฏۉ஋ͷࠩ෼͕େ͖͘ͳͬͨ৔߹ɼͦΕΛ ҟৗʹΑΔมಈͱݟ෼͚Δ͜ͱ͕Ͱ͖ͳ͍͜ͱ ʹىҼ͢Δͱߟ͍͑ͯΔɽ
  11. 25 ·ͱΊͱࠓޙͷల๬ • ຊൃදͰ͸ɼࣄલʹϞσϧͷֶश΍ର৅ϝτϦοΫͷࢦఆΛඞཁͱͤͣɼػցֶशϞσϧͷہॴత ͳղऍख๏Ͱ͋ΔSHAPΛ༻͍ͯγεςϜͷҟৗͷݪҼΛ਍அ͢Δख๏ͷݕ౼ͨ͠ɽ • ࠓճͷ࣮ݧʹ͓͚Δҟৗύλʔϯʹ͓͍ͯ͸ɼఏҊख๏Ͱ࠾༻͍ͯ͠ΔSHAPͷํ͕ϕʔεϥΠϯ ख๏ΑΓ΋ྑ͍ݪҼ਍அͷ݁ՌΛ༩͑Δ͜ͱ͕Θ͔ͬͨɽ • ࠓճͷ࣮ݧ৚݅ʹ͓͍ͯɼఏҊख๏ͷ࣮ߦ࣌ؒ͸64ඵͰ͋ΓɼSHAPͷܭࢉ͕ࢧ഑తͰ͋Δ͜ͱ͕

    Θ͔ͬͨɽ • ఏҊख๏ͷ༗༻ੑΛࣔͨ͢Ίʹ޿ൣͳҟৗύλʔϯʹରͯ͠ݪҼ਍அͷਫ਼౓ΛఆྔతʹධՁ͢Δ༧ ఆͰ͋Δɽ • ର৅ͱ͢ΔγεςϜ͕େن໛Խͨ͠ࡍʹɼఏҊख๏͕࣮༻ʹ଱͑͏Δ͔Λݕূ͢ΔɽͦͷͨΊʹɼ ϝτϦοΫ਺͕૿େͨ͠৔߹ͷݪҼ਍அͷܭࢉ࣌ؒͷධՁΛߦ͏ɽ • ࠓճͷ࠾༻ͨ͠PCA͸ɼઢܗ͔ͭ࣌ܥྻͷ৘ใΛߟྀ͍ͯ͠ͳ͍୯७ͳϞσϧͰ͋ΔͨΊɼࠓޙɼ 
 ඇઢܗͷϞσϧ΍࣌ܥྻʹରԠͨ͠ϞσϧΛ࠾༻͠ɼͦͷ༗ޮੑΛݕূ͢Δɽ ·ͱΊ ࠓޙͷల๬