Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Isotropy, Clusters, and Classifiers

Isotropy, Clusters, and Classifiers

2024-08-25,26: 第16回 最先端NLP勉強会
https://sites.google.com/view/snlp-jp/home/2024

Hayato Tsukagoshi

August 19, 2024
Tweet

More Decks by Hayato Tsukagoshi

Other Decks in Research

Transcript

  1. Isotropy, Clusters, and Classi fi ers D2, Graduate School of

    Informatics, Nagoya University, Japan Hayato Tsukagoshi Timothee Mickus, Stig-Arne Grönroos, Joseph Attieh ACL 2024
 https://aclanthology.org/2024.acl-short.7/
  2. •TL;DR: ຒΊࠐΈදݱͷ౳ํੑ޲্͸ • ҙຯతྨࣅ౓Λଌఆ͢ΔλεΫ (e.g. STS)ͷੑೳʹͱͬͯ༗ӹ • ΫϥελϦϯάλεΫͷੑೳʹͱͬͯෆརӹ •ߴ඼࣭ͳຒΊࠐΈදݱͷ֫ಘʹ͸౳ํੑ͕ॏཁͱ৴͡ΒΕ͖ͯͨ •

    ౳ํੑ: ຒΊࠐΈදݱ͕ۭؒதʹࢄΒ͹ͬͯ෼෍͍ͯ͠Δ͔ʁ •ຊݚڀ͸ඞͣ͠΋౳ํੑͷ޲্͕λεΫੑೳ޲্ʹد༩͠ͳ͍͜ͱΛࢦఠ • ͔ͳΓ໰୊ఏىدΓͷ࿦จ •࣮ࡍʹԼྲྀλεΫͰΫϥελϦϯάͱͯ͠ͷྑ͞ͱ౳ํੑʹͱͬͯͷྑ͞ͷ ૬൓ݱ৅Λ֬ೝͨ͠ ֓ཁ 2
  3. •౳ํత: ຒΊࠐΈදݱ͕ۭؒதʹࢄΒ͹ͬͯ෼෍͍ͯ͠Δ͜ͱ • ਺ֶతఆٛ: ෼ࢄڞ෼ࢄߦྻ͕୯Ґߦྻʹൺྫ͍ͯ͠Δ͜ͱ •ҟํత: ຒΊࠐΈදݱ͕ۭؒதʹภͬͯ෼෍͍ͯ͠Δ͜ͱ Isotropy (౳ํੑ) ͱ

    Anisotropy (ҟํੑ) 6 ਤ͸ Rudman et al. 2021 ΑΓҾ༻ ҟํతͳຒΊࠐΈ ͪΐͬͱ౳ํతʹͳͬͨຒΊࠐΈ ;ΜΘΓ Isotropy ͔ͬͪΓ Isotropy
  4. •Word2Vec΍GloVeͳͲͷֶशࡁΈ੩త୯ޠຒΊࠐΈ͸ҟํతʹ෼෍ [1] •BERT΍GPT-2ͳͲͷจ຺Խ୯ޠຒΊࠐΈ΋ҟํతʹ෼෍ [2] •→ʮ౳ํੑΛ޲্ͤ͞Δ͜ͱ͕ੑೳ޲্ʹͭͳ͕Δ͸ͣʯͱ͍͏৴ڼͷ஀ੜ •౳ํੑ޲্ͷఆੑతޮՌ (ಛʹന৭Խ) • ੩త୯ޠຒΊࠐΈʹରͯ͠: ୯ޠස౓ʹΑΔόΠΞεΛআڈ͢Δ

    [3] • ಈత୯ޠຒΊࠐΈʹରͯ͠: ͳΜΒ͔ଞͷόΠΞεΛআڈ͢Δ •౳ํੑΛ޲্ͤͭͭ͞දݱֶश͢Δख๏ͱͯ͠ରরֶश͕୆಄ [1] Mu et al., All-but-the-Top: Simple and E ff ective Postprocessing for Word Representations, arXiv 2017 [2] Ethayaraja, How Contextual are Contextualized Word Representations? Comparing the Geometry of BERT, ELMo, and GPT-2 Embeddings, arXiv 2019 [3] Sasaki et al., Examining the e ff ect of whitening on static and contextualized word embeddings, Information Processing & Management 2023 ຒΊࠐΈදݱͱ౳ํੑ 7 [
  5. •Word2Vec΍GloVeͳͲͷֶशࡁΈ੩త୯ޠຒΊࠐΈ͸ҟํతʹ෼෍ [1] •BERT΍GPT-2ͳͲͷจ຺Խ୯ޠຒΊࠐΈ΋ҟํతʹ෼෍ [2] •→ʮ౳ํੑΛ޲্ͤ͞Δ͜ͱ͕ੑೳ޲্ʹͭͳ͕Δ͸ͣʯͱ͍͏৴ڼͷ஀ੜ •౳ํੑ޲্ͷఆੑతޮՌ (ಛʹന৭Խ) • ੩త୯ޠຒΊࠐΈʹରͯ͠: ୯ޠස౓ʹΑΔόΠΞεΛআڈ͢Δ

    [3] • ಈత୯ޠຒΊࠐΈʹରͯ͠: ͳΜΒ͔ଞͷόΠΞεΛআڈ͢Δ •౳ํੑΛ޲্ͤͭͭ͞දݱֶश͢Δख๏ͱͯ͠ରরֶश͕୆಄ [1] Mu et al., All-but-the-Top: Simple and E ff ective Postprocessing for Word Representations, arXiv 2017 [2] Ethayaraja, How Contextual are Contextualized Word Representations? Comparing the Geometry of BERT, ELMo, and GPT-2 Embeddings, arXiv 2019 [3] Sasaki et al., Examining the e ff ect of whitening on static and contextualized word embeddings, Information Processing & Management 2023 ຒΊࠐΈදݱͱ౳ํੑ 8 [
  6. •දݱֶश (representation learning) ͷख๏ͷҰͭ •ਖ਼ྫಉ͕࢜ۙͮ͘Α͏ʹɺ͔ͭɺෛྫಉ͕࢜཭ΕΔΑ͏ʹֶश͢Δ • ਖ਼ྫಉ࢜ͷྨࣅ౓࠷େԽ & ෛྫಉ࢜ͷྨࣅ౓࠷খԽ ଛࣦ

    (InfoNCE [4]) ͷܭࢉ •ਖ਼ྫಉ࢜ͷຒΊࠐΈදݱͷcosྨࣅ౓ΛٻΊΔ •ෛྫಉ࢜ͷຒΊࠐΈදݱͷcosྨࣅ౓ΛٻΊΔ •ྨࣅ౓Λฒ΂ͯԹ౓ύϥϝʔλΛద༻͢Δ •Softmaxؔ਺Λద༻ͯ֬͠཰෼෍ͱΈͳ͢ •ਖ਼ྫʹ͚ͩ1ཱ͕ͭ෼෍ʹ͚ۙͮΔ [4] Oord et al., Representation Learning with Contrastive Predictive Coding, arXiv 2018 ରরֶश 9
  7. •දݱֶश (representation learning) ͷख๏ͷҰͭ •ਖ਼ྫಉ͕࢜ۙͮ͘Α͏ʹɺ͔ͭɺෛྫಉ͕࢜཭ΕΔΑ͏ʹֶश͢Δ • ਖ਼ྫಉ࢜ͷྨࣅ౓࠷େԽ & ෛྫಉ࢜ͷྨࣅ౓࠷খԽ ଛࣦ

    (InfoNCE [4]) ͷܭࢉ •ਖ਼ྫಉ࢜ͷຒΊࠐΈදݱͷcosྨࣅ౓ΛٻΊΔ •ෛྫಉ࢜ͷຒΊࠐΈදݱͷcosྨࣅ౓ΛٻΊΔ •ྨࣅ౓Λฒ΂ͯԹ౓ύϥϝʔλΛద༻͢Δ •Softmaxؔ਺Λద༻ͯ֬͠཰෼෍ͱΈͳ͢ •ਖ਼ྫʹ͚ͩ1ཱ͕ͭ෼෍ʹ͚ۙͮΔ [4] Oord et al., Representation Learning with Contrastive Predictive Coding, arXiv 2018 ରরֶश 10
  8. •ରরֶशʹ͓͚Δදݱͷ “ྑ͞” ͷࢦඪ [5] • ௒ٿ໘্ʹຒΊࠐΈ͕Ͳ͏෼෍͍ͯ͠Δ͔ʁ Alignment •ਖ਼ྫϖΞͷຒΊࠐΈ͕े෼͍͍ۙͮͯΔ͔ʁ Uniformity •ຒΊࠐΈදݱͷฏۉೋ఺ؒڑ཭

    •౳ํੑͷධՁʹར༻͞ΕΔ͜ͱ͕͋Δ (e.g. SimCSE) • ݫີʹ͸౳ํੑ͸ଌΕ͍ͯͳ͍
 (෼ࢄڞ෼ࢄߦྻΛݟ͍ͯͳ͍) [5] Wang et al., Understanding Contrastive Representation Learning through Alignment and Uniformity on the Hypersphere, ICML 2020 ౳ํੑͷଌΓํ: Alignment & Uniformity 11
  9. •ରরֶशʹ͓͚Δදݱͷ “ྑ͞” ͷࢦඪ [5] • ௒ٿ໘্ʹຒΊࠐΈ͕Ͳ͏෼෍͍ͯ͠Δ͔ʁ Alignment •ਖ਼ྫϖΞͷຒΊࠐΈ͕े෼͍͍ۙͮͯΔ͔ʁ Uniformity •ຒΊࠐΈදݱͷฏۉೋ఺ؒڑ཭

    •౳ํੑͷධՁʹར༻͞ΕΔ͜ͱ͕͋Δ (e.g. SimCSE) • ݫີʹ͸౳ํੑ͸ଌΕ͍ͯͳ͍
 (෼ࢄڞ෼ࢄߦྻΛݟ͍ͯͳ͍) [5] Wang et al., Understanding Contrastive Representation Learning through Alignment and Uniformity on the Hypersphere, ICML 2020 ౳ํੑͷଌΓํ: Alignment & Uniformity 12
  10. •ࢦඪͷཧղ: ۭؒதͷͲΕ͘Β͍ͷ࣍ݩΛۉ౳ʹར༻͍ͯ͠Δ͔ͷׂ߹ •ܭࢉํ๏ͷେࡶ೺ͳཧղ: • ຒΊࠐΈͷू߹ΛPCAͯ࣍͠ݩ͝ͱແ૬ؔԽ • ෼ࢄϕΫτϧΛͦͷϊϧϜͰׂͬͯਖ਼نԽˠ 1ϕΫτϧͱͷζϨΛଌΔ [6] Rudman

    et al., IsoScore: Measuring the Uniformity of Embedding Space Utilization, ACL 2022 fi ndings ౳ํੑͷଌΓํ: IsoScore [6] 13 0.9996 0.6105 0.0281 2࣍ݩΨ΢ε෼෍ʹ͓͚Δ෼෍ܗঢ়ͱIsoScoreͷؔ܎
  11. •ࢦඪͷཧղ: ۭؒதͷͲΕ͘Β͍ͷ࣍ݩΛۉ౳ʹར༻͍ͯ͠Δ͔ͷׂ߹ •ܭࢉํ๏ͷେࡶ೺ͳཧղ: • ຒΊࠐΈͷू߹ΛPCAͯ࣍͠ݩ͝ͱແ૬ؔԽ • ෼ࢄϕΫτϧΛͦͷϊϧϜͰׂͬͯਖ਼نԽˠ 1ϕΫτϧͱͷζϨΛଌΔ [6] Rudman

    et al., IsoScore: Measuring the Uniformity of Embedding Space Utilization, ACL 2022 fi ndings ౳ํੑͷଌΓํ: IsoScore [6] 14 0.9996 0.6105 0.0281 2࣍ݩΨ΢ε෼෍ʹ͓͚Δ෼෍ܗঢ়ͱIsoScoreͷؔ܎
  12. •ΫϥελϦϯάʹ͓͚Δྑ͞ΛଌΔͨΊͷࢦඪ [7] • ಉΫϥε͕ۙ͘ɺผΫϥε͕ԕ͍͔ʁ [7] Rousseeuw, Silhouettes: A graphical aid

    to the interpretation and validation of cluster analysis, Journal of Computational and Applied Mathematics 1987. Silhouette score: γϧΤοτείΞ 16 ∈ [−1,1]
  13. •ΫϥελϦϯάʹ͓͚Δྑ͞ΛଌΔͨΊͷࢦඪ [7] • ಉΫϥε͕ۙ͘ɺผΫϥε͕ԕ͍͔ʁ [7] Rousseeuw, Silhouettes: A graphical aid

    to the interpretation and validation of cluster analysis, Journal of Computational and Applied Mathematics 1987. Silhouette score: γϧΤοτείΞ 17 ∈ [−1,1]
  14. •ΫϥελϦϯάʹ͓͚Δྑ͞ΛଌΔͨΊͷࢦඪ [7] • ಉΫϥε͕ۙ͘ɺผΫϥε͕ԕ͍͔ʁ [7] Rousseeuw, Silhouettes: A graphical aid

    to the interpretation and validation of cluster analysis, Journal of Computational and Applied Mathematics 1987. Silhouette score: γϧΤοτείΞ 18 ͋Δࣄྫɾ͋Δू߹ͷ
 ฏۉϢʔΫϦουϊϧϜ ∈ [−1,1]
  15. •ΫϥελϦϯάʹ͓͚Δྑ͞ΛଌΔͨΊͷࢦඪ [7] • ಉΫϥε͕ۙ͘ɺผΫϥε͕ԕ͍͔ʁ [7] Rousseeuw, Silhouettes: A graphical aid

    to the interpretation and validation of cluster analysis, Journal of Computational and Applied Mathematics 1987. Silhouette score: γϧΤοτείΞ 19 ͋Δࣄྫɾ͋Δू߹ͷ
 ฏۉϢʔΫϦουϊϧϜ ͋ΔࣄྫͱಉΫϥεͷࣄྫ
 ͱͷίετ (Intra-cluster) ∈ [−1,1]
  16. •ΫϥελϦϯάʹ͓͚Δྑ͞ΛଌΔͨΊͷࢦඪ [7] • ಉΫϥε͕ۙ͘ɺผΫϥε͕ԕ͍͔ʁ [7] Rousseeuw, Silhouettes: A graphical aid

    to the interpretation and validation of cluster analysis, Journal of Computational and Applied Mathematics 1987. Silhouette score: γϧΤοτείΞ 20 ͋Δࣄྫɾ͋Δू߹ͷ
 ฏۉϢʔΫϦουϊϧϜ ͋ΔࣄྫͱಉΫϥεͷࣄྫ
 ͱͷίετ (Intra-cluster) ͋ΔࣄྫͱผΫϥεͷࣄྫ
 ͱͷ࠷খίετ (Inter-cluster) ∈ [−1,1]
  17. •ΫϥελϦϯάʹ͓͚Δྑ͞ΛଌΔͨΊͷࢦඪ [7] • ಉΫϥε͕ۙ͘ɺผΫϥε͕ԕ͍͔ʁ [7] Rousseeuw, Silhouettes: A graphical aid

    to the interpretation and validation of cluster analysis, Journal of Computational and Applied Mathematics 1987. Silhouette score: γϧΤοτείΞ 21 ͋Δࣄྫɾ͋Δू߹ͷ
 ฏۉϢʔΫϦουϊϧϜ ͋ΔࣄྫͱಉΫϥεͷࣄྫ
 ͱͷίετ (Intra-cluster) ͋ΔࣄྫͱผΫϥεͷࣄྫ
 ͱͷ࠷খίετ (Inter-cluster) େ͖͍΄Ͳ͍͍஋ ∈ [−1,1]
  18. •ઢܗ෼ྨثΛ܇࿅ͨ͠ࡍͷIsoScore, Silhouette scoreͷมԽΛ؍࡯ •λεΫ • SBERTͷຒΊࠐΈ͔Βۃੑ෼ྨ • SBERTͷຒΊࠐΈ͔Βࣗવݴޠਪ࿦ (2จؒͷҙຯؔ܎༧ଌ) •

    Word2VecͷຒΊࠐΈ͔ΒPOS-tagging • Word2VecͷຒΊࠐΈ͔ΒWordNetͷsupersense༧ଌ ࣮ݧ 30 SBERTΛ࢖͏λεΫʹ͍ͭͯ͸ຊจதʹ “we directly optimize the output embeddings of the SBERT model rather
 than update the parameters of the SBERT model” ͱ͋Δ͕ɺ͜Ε͸͔ͳΓมͳઃఆͱ͍͏ؾ͕͢Δ…? SBERTͷग़ྗຒΊࠐΈΛnn.Parameterʹͯ͠࠷దԽ͍ͯ͠ΔΑ͏ (SBERTͷग़ྗจຒΊࠐΈͷू߹ΛWord2VecͷΑ͏ʹѻ͍ͬͯΔ?) SBERTͷύϥϝʔλ͸ݻఆ
  19. •ֶशͷաఔͰ • Silhouette score (ΫϥελϦϯάʹ͓͍ͯͷྑ͞) ͸վળ • IsoScore (౳ํੑʹ͓͍ͯͷྑ͞) ͸ѱԽ

    •ֶश͕ਐΉʹͭΕͯຒΊࠐΈ͕ҟํతʹ෼෍͢ΔΑ͏มԽ ݁Ռ: ֶशதͷSilhouetteείΞɾIsoScoreͷભҠ 31 Silhouette score IsoScore
  20. •ຒΊࠐΈදݱͷ౳ํੑ޲্͸ • ҙຯతྨࣅ౓Λଌఆ͢ΔλεΫ (e.g. STS) ͷੑೳʹͱͬͯ༗ӹ • ΫϥελϦϯάλεΫͷੑೳʹͱͬͯෆརӹ •ຒΊࠐΈͷઢܗ෼ྨͷ໨తؔ਺͸ΫϥελϦϯάͷྑ͞ͱ΄΅౳Ձ ॴײ

    •શମతʹͤ΍Μͳͱ͍͏ײ͡ͷ࿦จ • ຒΊࠐΈ΍ͬͯΔਓ͸ͳΜͱͳ͘ײͯͨ͡໰୊ҙࣝΛͪΌΜͱ໌ݴͯ͠Έ ·ͨ͠ͱ͍͏งғؾ •୯ҰϞσϧͰཱ྆Ͱ͖ͳ͍͔ʁʹ͍ͭͯ͸ະ஌ ·ͱΊ 33
  21. •ۙ೥ͷ൚༻ςΩετຒΊࠐΈϞσϧ(E5΍GTE)͸ରরֶशʹ͓͚ΔԹ౓ύϥ ϝʔλΛখ͍ͯ͘͞͠Δ (τ = 0.01) • ςΩετຒΊࠐΈ+ରরֶशͷ૲෼͚తଘࡏͰ͋ΔSimCSE͸0.05 •ରরֶशͷԹ౓ύϥϝʔλ͸্͛Δ΄ͲڧݻͳΫϥελΛ੒͢ [8] •

    େن໛ֶशͷࡍ͸ϊΠζ༝དྷͷṖΫϥελ͕Ͱ͖ͳ͍Α͏Թ౓ΛԼ͛Δʁ [8] Wang et al., Understanding the Behaviour of Contrastive Loss, CVPR 2021 ༨ஊ: ରরֶशͷԹ౓ύϥϝʔλ 35