$30 off During Our Annual Pro Sale. View Details »

WhitenedCSE: Whitening-based Contrastive Learning of Sentence Embeddings

WhitenedCSE: Whitening-based Contrastive Learning of Sentence Embeddings

対照学習と白色化処理を組み合わせた文埋め込み手法 WhitenedCSE について解説した資料です。

2023-09-28: ACL2023読み会@名大
http://cr.fvcrc.i.nagoya-u.ac.jp/~sasano/acl2023nagoya/

Hayato Tsukagoshi

September 26, 2023
Tweet

More Decks by Hayato Tsukagoshi

Other Decks in Research

Transcript

  1. WhitenedCSE: Whitening-based Contrastive
    Learning of Sentence Embeddings
    D1, Graduate School of Informatics, Nagoya University, Japan
    Hayato Tsukagoshi
    Wenjie Zhuo, Yifan Sun, Xiaohan Wang, Linchao Zhu, Yi Yang

    ACL 2023

    https://aclanthology.org/2023.acl-long.677/

    View Slide

  2. •ରরֶश͸จຒΊࠐΈͷֶशʹ༗ޮ͕ͩ

    ෛྫಉ͕࢜཭ΕΔΑ͏ʹ͢Δಇ͖͕ऑ͍

    • ຒΊࠐΈͷҟํੑͷ໰୊ʹͭͳ͕Δ

    •ന৭Խॲཧ͸༗๬͕ͩରরֶशͱͷ૬ੑ͸ະ஌

    •ରরֶशͱന৭ԽॲཧΛ૊Έ߹Θͤͨ

    จຒΊࠐΈֶशख๏ΛఏҊ
    • ຒΊࠐΈΛάϧʔϓʹࡉ෼Խͯ͠ന৭Խ

    • ෳ਺ͷਖ਼ྫΛ༻͍ͨରরֶश
    ֓ཁ
    2

    View Slide

  3. •ࣗવݴޠจͷີϕΫτϧදݱ
    •ϕΫτϧͷڑ཭͕จͷҙຯͷۙ͞Λදݱ
    ಋೖ: จຒΊࠐΈ / Sentence embedding
    3
    ͜Ͳ΋͕Ոʹ޲͔͍ͬͯΔɻ
    ͜Ͳ΋ֶ͕ߍ͔ΒՈʹ޲͔͍ͬͯΔɻ
    ͜Ͳ΋͕ਤॻؗʹ͍Δɻ
    ͜Ͳ΋͕ޕޙʹา͍͍ͯΔɻ
    จຒΊࠐΈۭؒ
    [0.1, 0.2, ...]
    [0.1, 0.3, ...]
    [0.9, 0.8, ...]
    [0.5, 0.7, ...]

    View Slide

  4. •ࣗવݴޠจͷີϕΫτϧදݱ
    •ϕΫτϧͷڑ཭͕จͷҙຯͷۙ͞Λදݱ
    ಋೖ: จຒΊࠐΈ / Sentence embedding
    4
    ͜Ͳ΋͕Ոʹ޲͔͍ͬͯΔɻ
    ͜Ͳ΋ֶ͕ߍ͔ΒՈʹ޲͔͍ͬͯΔɻ
    ͜Ͳ΋͕ਤॻؗʹ͍Δɻ
    ͜Ͳ΋͕ޕޙʹา͍͍ͯΔɻ
    จຒΊࠐΈۭؒ
    [0.1, 0.2, ...]
    [0.1, 0.3, ...]
    [0.9, 0.8, ...]
    [0.5, 0.7, ...]
    ҙຯతʹྨࣅ
    ͍ۙҙຯΛ࣋ͭจ͸
    ۙ͘ʹ෼෍
    ϕΫτϧؒͷڑ཭͕

    ҙຯతͳؔ܎Λදݱ

    View Slide

  5. ಋೖ: Contrastive Learning / ରরֶश
    •ਖ਼ྫͱෛྫͷಛ௃දݱΛϞσϧ͔Βग़ྗ

    •ਖ਼ྫಉ࢜ͷྨࣅ౓͕ߴ͘ͳΔΑ͏ʹֶशΛߦ͏

    •Computer Vision෼໺ͰେਓؾɺNLPͰ΋ྲྀߦத

    SimCLR
    •ಉ͡ը૾ʹରͯ͠ҟͳΔ

    data augmentationΛͨ͠

    ը૾ಉ࢜Λਖ਼ྫʹ͢Δ

    •ޙஈͷը૾෼ྨλεΫͳͲ

    Ͱߴ͍ੑೳ

    •CVʹ͓͚ΔදݱֶशͷͨΊ

    ͷpre-trainingͱͯ͠༗ޮ
    5
    ը૾͸ϒϩά[16]ΑΓҾ༻
    Oord+: Representation Learning with Contrastive Predictive Coding, arXiv ‘18

    Chen+: A Simple Framework for Contrastive Learning of Visual Representations, ICML ’20

    Advancing Self-Supervised and Semi-Supervised Learning with SimCLR, ’20

    Chen+: Big Self-Supervised Models are Strong Semi-Supervised Learners, NeurIPS ’20

    View Slide

  6. •ྨࣅ౓ߦྻΛܭࢉɺର֯੒෼(ਖ਼ྫಉ࢜ͷྨࣅ౓)Λਖ਼ղͱ͢Δ

    • ର֯੒෼ͷྨࣅ౓࠷େԽ == ਖ਼ྫಉ࢜ͷྨࣅ౓࠷େԽ
    in-batch negatives
    •mini-batch಺ͷ͋Δࣄྫʹ͍ͭͯɺଞͷࣄྫΛෛྫͱͯ͠ߟ͑Δ

    ଛࣦؔ਺ (InfoNCE)
    Contrastive Learning / ֶशखॱ
    6
    ℒi
    = − log
    esim(hi
    ,h+
    i
    )/τ
    ∑N
    j=1
    esim(hi
    ,h+
    j
    )/τ

    View Slide

  7. •ಛ௃දݱͷྑ͞ΛଌΔ (ඍ෼Մೳͳ) ࢦඪ (ଛࣦؔ਺)

    • ͜ΕΒ͕ྑ͍(௿͍)΄Ͳྑ͍දݱ(ͱ͞Ε͍ͯΔ)


    Alignment
    •ࣅͨαϯϓϧ͕ಛ௃্ۭؒͰۙ͘ʹ෼෍ͯ͘͠ΕΔ͔

    Uniformity
    •ಛ௃දݱ͕୯Ґ௒ٿ໘্ʹҰ༷෼෍͢Δ͔
    Wang+: Understanding Contrastive Representation Learning through Alignment and Uniformity on the Hypersphere, ICML ’20
    Alignment / Uniformity
    7

    View Slide

  8. •จຒΊࠐΈ+ରরֶशͷύΠΦχΞతݚڀ

    •Unsupervised SimCSE:ʮಉ͡จΛ2ճຒΊࠐΜͰରরֶशʯ
    •Supervised SimCSE: ʮؚҙؔ܎ʹ͋ΔจΛਖ਼ྫͱͯ͠ରরֶशʯ
    Gao+: SimCSE: Simple Contrastive Learning of Sentence Embeddings, EMNLP 2021
    ઌߦݚڀ: SimCSE
    8

    View Slide

  9. Unsupervised SimCSE:ʮਖ਼ଇԽ+จຒΊࠐΈಉ࢜Λ཭͢ʯ
    Supervised SimCSE:ʮҙຯతʹ͍ۙจຒΊࠐΈΛ͚ۙͮΔ+ͦͷଞͷจຒΊࠐΈಉ࢜Λ཭͢ʯ
    SimCSE: Ϟνϕʔγϣϯ
    9

    View Slide

  10. •σʔλಉ͕࢜௒ٿ໘্ʹҰ༷෼෍͢ΔΑ͏ม׵

    • ฏۉ: 0
    • ෼ࢄڞ෼ࢄߦྻ: ୯Ґߦྻ
    ന৭Խʹ࢖͏ख๏
    •Principal Component Analysis (PCA): σʔλߦྻΛݻ༗஋෼ղ

    •Zero-phase Component Analysis (ZCA): PCA + ճసଧͪফ͠

    ҟํੑ (anisotropy)
    •σʔλ͕ߴ࣍ݩ্ۭؒͷ௿࣍ݩۭؒͷΈʹ෼෍ͯ͠͠·͏ੑ࣭

    • ന৭Խ͸ҟํੑͷղফʹ༗༻ (౳ํతʹ෼෍ͨ͠ํ͕ੑೳ͕ྑ͍͜ͱ͕ଟ͍)
    ന৭Խ
    10
    H = WZ
    HHT = I

    WZ(WZ)T = WZZTWT
    ZZT = UΛUT
    WPCA = Λ−1/2UT

    WZCA = UΛ−1/2UT

    View Slide

  11. •SimCSEͰ΋alignment / uniformity͸޲্͍ͯ͠Δ͕…

    • ରরֶशͩͱෛྫಉ࢜Λ཭ͤͳ͍

    Shu
    ff
    l
    ed Group Whitening (SGW)
    •γϟοϑϧͯ͠άϧʔϓ͝ͱʹന৭Խ

    •ݩͷॱ൪ʹ໭ͯ͠ग़ྗຒΊࠐΈͱ͢Δ

    Multi-Positive Contrastive Loss
    •SGWΛෳ਺ճ܁Γฦͯ͠ਖ਼ྫΛਫ૿͠

    •ଟ༷ͳਖ਼ྫͰֶशͰ͖ؤ݈ੑ޲্
    WhitenedCSE
    11

    View Slide

  12. Shu
    ffl
    ed Group Whitening: ٙࣅίʔυ
    12
    https://github.com/SupstarZh/WhitenedCSE/blob/master/whitenedcse/shu
    ffl
    ed_group_whitening.py

    View Slide

  13. Shu
    ffl
    ed Group Whitening: ٙࣅίʔυ
    13
    https://github.com/SupstarZh/WhitenedCSE/blob/master/whitenedcse/shu
    ffl
    ed_group_whitening.py
    γϟοϑϧͯ͠

    άϧʔϓ෼͚

    View Slide

  14. Shu
    ffl
    ed Group Whitening: ٙࣅίʔυ
    14
    https://github.com/SupstarZh/WhitenedCSE/blob/master/whitenedcse/shu
    ffl
    ed_group_whitening.py
    ฏۉΛ0ʹ

    View Slide

  15. Shu
    ffl
    ed Group Whitening: ٙࣅίʔυ
    15
    https://github.com/SupstarZh/WhitenedCSE/blob/master/whitenedcse/shu
    ffl
    ed_group_whitening.py
    (G, d, B) x (G, B, d) → (G, d, d)

    View Slide

  16. Shu
    ffl
    ed Group Whitening: ٙࣅίʔυ
    16
    https://github.com/SupstarZh/WhitenedCSE/blob/master/whitenedcse/shu
    ffl
    ed_group_whitening.py
    ෼ࢄڞ෼ࢄߦྻΛ

    ݻ༗஋෼ղ

    View Slide

  17. Shu
    ffl
    ed Group Whitening: ٙࣅίʔυ
    17
    https://github.com/SupstarZh/WhitenedCSE/blob/master/whitenedcse/shu
    ffl
    ed_group_whitening.py
    ॱ൪Λݩʹ໭͢

    View Slide

  18. λεΫ: จຒΊࠐΈͷඪ४తͳϕϯνϚʔΫ
    •Semantic Textual Similarity (STS)

    •SentEval

    ௥ՃͷධՁࢦඪ
    •Uniformity, Alignment

    ܇࿅ઃఆ
    •ӳޠWikipedia͔ΒϥϯμϜʹαϯϓϦϯάͨ͠100ສจ (ϥϕϧͳ͠)

    •BERT, RoBERTaΛ
    fi
    ne-tuning

    •ന৭Խ͸όον͝ͱʹɺ384άϧʔϓ(1άϧʔϓ͋ͨΓ2࣍ݩ) (BERT)
    ࣮ݧ
    18

    View Slide

  19. •จຒΊࠐΈϞσϧͷҙຯΛଊ͑Δ

    ೳྗΛਓؒධՁͱͷ૬ؔͰධՁ
    •จϖΞʹҙຯతྨࣅ౓͕ਓखͰ෇༩

    •ਓखධՁͱϞσϧ͕ܭࢉͨ͠ྨࣅ౓

    ͷ૬ؔ܎਺ͰධՁ

    • Pearsonͷ(ੵ཰)૬ؔ܎਺

    • SpearmanͷॱҐ૬ؔ܎਺
    •จຒΊࠐΈධՁͰ͸ڭࢣͳ͠ઃఆ
    • STSσʔλΛ༻ֶ͍ͨश͸͠ͳ͍

    • ࣄલʹ܇࿅͞ΕͨϞσϧΛධՁ
    Semantic Textual Similarity (STS)
    19

    View Slide

  20. ڭࢣͳ͠STSͷධՁखॱ
    ᶃ ύϥϝʔλΛݻఆͨ͠

    จຒΊࠐΈϞσϧΛ༻ҙ

    ᶄ จϖΞͦΕͧΕΛจຒΊࠐΈʹม׵

    ᶅ จ“ϕΫτϧ”ϖΞͷྨࣅ౓Λܭࢉ

    • ίαΠϯྨࣅ౓͕Α͘༻͍ΒΕΔ

    ᶆ ਓؒධՁͱͷ(ॱҐ)૬ؔ܎਺Λܭࢉ

    •૬ؔ܎਺͕ߴ͍ํ͕“ྑ͍จຒΊࠐΈ”
    ڭࢣͳ͠ (Unsupervised) STSλεΫ
    20
    จA จB



    ᶃ จຒΊࠐΈϞσϧ
    ਓखධՁͱͷ

    ૬ؔ܎਺ͰධՁ
    จྨࣅ౓

    View Slide

  21. •ςΩετ෼ྨͳͲͷԼྲྀλεΫ͕ू·ͬͨtoolkit

    •จຒΊࠐΈΛೖྗͱ͢Δ෼ྨثΛ܇࿅ɺ෼ྨੑೳ͔ΒจຒΊࠐΈͷ࣭ΛධՁ
    Conneau+: SentEval: An Evaluation Toolkit for Universal Sentence Representations, LREC ‘18
    SentEval
    21
    Task ෼ྨର৅ Ϋϥε਺ ྫจ
    MR өըϨϏϡʔͷpos/neg 2 Too slow for a younger crowd, too shallow for an older one.
    CR ঎඼ϨϏϡʔͷpos/neg 2 We tried it out christmas night and it worked great.
    SUBJ өը/͋Β͢͡ͷओ؍ੑ 2 A movie that doesn’t aim too high, but doesn’t need to.
    MPQA ϑϨʔζͷۃੑ 2 would like to tell
    SST-2 өըϨϏϡʔͷpos/neg 2 Audrey Tautou has a knack for picking roles that magnify her [..]
    TREC ࣭໰ͷछผ 6 What are the twin cities?
    MRPC 2จ͕ݴ͍׵͔͑Ͳ͏͔ 2
    The procedure is generally performed in the second or third
    trimester. & The technique is used during the second and,
    occasionally, third trimester of pregnancy.

    View Slide

  22. SentEvalͷධՁखॱ
    ᶃ ύϥϝʔλΛݻఆͨ͠

    จຒΊࠐΈϞσϧΛ༻ҙ

    ᶄ ֤จΛจຒΊࠐΈʹม׵

    ᶅ จຒΊࠐΈΛೖྗͱ͢Δ෼ྨثΛ܇࿅

    ᶆ ෼ྨثͷੑೳ͔ΒจຒΊࠐΈ

    ͷ඼࣭ΛධՁ

    •෼ྨੑೳ͕ߴ͍ํ͕“ྑ͍จຒΊࠐΈ”

    •෼ྨث͸ϩδεςΟοΫճؼ෼ྨث͕ଟ͍
    SentEval
    22



    ෼ྨੑೳ͔Β

    จຒΊࠐΈͷ඼࣭ΛධՁ

    จຒΊࠐΈϞσϧ
    ෼ྨث

    View Slide

  23. BERT-
    fl
    ow: ҟํతͳBERTͷจຒΊࠐΈۭ͔ؒΒ౳ํతͳજࡏۭؒ΁ͷࣸ૾Λֶश

    BERT-whitening: จຒΊࠐΈͷฏۉ͕0ɼڞ෼ࢄߦྻ͕୯ҐߦྻʹͳΔΑ͏ʹઢܗม׵ (+࣍ݩ࡟ݮ)

    IS-BERT: จຒΊࠐΈͱจதͷn-gramͷຒΊࠐΈͷ૬ޓ৘ใྔΛ࠷େԽ͢ΔΑ͏ʹֶश

    BERT-CT: ҟͳΔೋͭͷಉ͡Ϟσϧͷಉ͡จʹର͢ΔຒΊࠐΈಉ࢜ͷ಺ੵ͕େ͖͘ͳΔΑ͏ʹֶश

    SimCSE: ҟͳΔDropoutΛద༻ͨ͠ಉ͡จΛਖ਼ྫ or ؚҙؔ܎ͷจϖΞΛਖ਼ྫͱͨ͠ରরֶश

    MixCSE: ҟͳΔจΛࠞͥͨจΛhard negativeͱͯ͠ڭࢣͳ͠ରরֶश

    ArcCSE: ؚҙϖΞจຒΊࠐΈͷmargin෇͖֯౓࠷খԽ+DAͨ͠จΛෛྫʹ͢ΔTriplet Lossͷ༥߹

    DCLR: Ψ΢γΞϯϊΠζΛෛྫͱͯ͠௥Ճ + ࣄྫ͝ͱॏΈ෇͚ͯ͠Unsup-SimCSE

    MoCoSE: ϞʔϝϯλϜΤϯίʔμͷ࠷దͳෛྫ਺෼ੳ+FGSMʹΑΔσʔλ֦ுͰରরֶश
    Li+: On the Sentence Embeddings from Pre-trained Language Models, EMNLP '20

    Su+: Whitening Sentence Representations for Better Semantics and Faster Retrieval, arXiv ’21

    Zhang+: An Unsupervised Sentence Embedding Method by Mutual Information Maximization, EMNLP ’20

    Carlsson+: Semantic Re-tuning with Contrastive Tension, ICLR ’21

    Gao+: SimCSE: Simple Contrastive Learning of Sentence Embeddings, EMNLP ’21

    Zhang+: Unsupervised Sentence Representation via Contrastive Learning with Mixing Negatives, AAAI ’22

    Zhang+: A Contrastive Framework for Learning Sentence Representations from Pairwise and Triple-wise Perspective in Angular Space, ACL ’22

    Zhou+: Debiased Contrastive Learning of Unsupervised Sentence Representations, ACL 2022

    Cao+: Exploring the Impact of Negative Samples of Contrastive Learning: A Case Study of Sentence Embedding, ACL
    fi
    ndings ’22
    ൺֱख๏
    23

    View Slide

  24. •BERT-base, BERT-largeͷ૒ํͰ࠷ߴੑೳ
    ࣮ݧ݁Ռ: STS / BERT
    24

    View Slide

  25. •BERT-base, BERT-largeͷ૒ํͰ࠷ߴੑೳ
    ࣮ݧ݁Ռ: STS / BERT
    25
    ൃදऀ஫: ίʔυΛಡΜͩײ͡STSͷจ1ͱจ2Λผʑʹന৭Խͯ͠͠·ͬͯΔΑ͏ʹݟ͑Δ
    ධՁ࣌ʹ΋ന৭Խ

    ͍ͯ͠Δ͔͸ෆ໌ྎ

    View Slide

  26. •ඍ͕ࠩͩͪ͜ΒͰ΋࠷ߴੑೳ
    ࣮ݧ݁Ռ: SentEval
    26

    View Slide

  27. •ֶशॳظ͔ΒUniformityͷ஋͕͍͍ (lower is better)

    • ന৭ԽʹΑֶͬͯशॳظ͔ΒຒΊࠐΈ͕Ұ༷෼෍͢ΔͨΊ
    Alignment / Uniformity
    27
    Alignment Uniformity
    SimCSE
    WhitenedCSE

    View Slide

  28. •ֶशޙͷຒΊࠐΈදݱΛՄࢹԽͨ͠΋ͷ

    •WhitenedCSEʹΑΔຒΊࠐΈ͕࠷΋Ұ༷ʹ෼෍͍ͯ͠Δ(ؾ͕͢Δ)
    WhitenedCSEʹΑΔຒΊࠐΈͷՄࢹԽ
    28
    BERT SimCSE WhitenedCSE

    View Slide

  29. •άϧʔϓͷ਺Λ૿΍ͨ͠ํ͕(άϧʔϓ͝ͱ࣍ݩ਺Λখͨ͘͞͠ํ͕)ߴੑೳ

    • ΑΓϚΠϧυͳന৭Խ(ແ૬ؔԽ)ʹͳ͍ͬͯΔ

    •γϟοϑϧ͢Δ͜ͱͰ҆ఆతʹੑೳ޲্

    •ςετηοτͷ݁ՌͰAblation͢Δͷ΍Ίͯ΄͍͠
    Ablation: Group Size
    29
    Group Size͝ͱͷੑೳͷҧ͍

    View Slide

  30. •ന৭Խख๏ʹΑͬͯੑೳ͸େ͖͘มԽ

    • Group-whitening͕͔ͳΓ༗ޮͦ͏

    • γϟοϑϧ͢Δͱ͞Βʹྑ͘ͳΔ

    •ന৭Խͳ͠ͷ৔߹(άϧʔϓ෼͚ͯ͠ෳ਺ਖ਼ྫͷରরֶश)Ͱ΋ଟগੑೳ޲্

    • ͕ɺSGWͱ૊Έ߹Θͤͨࡍͷվળ෯͕େ͖ͦ͏
    Ablation: ന৭Խख๏ɾϞδϡʔϧ
    30
    ന৭Խख๏ Ϟδϡʔϧͷ༗ແ

    View Slide

  31. •ରরֶशͱന৭ԽॲཧΛ૊Έ߹Θͤͨ

    จຒΊࠐΈֶशख๏ΛఏҊ
    • ຒΊࠐΈΛάϧʔϓʹ෼ׂͯ͠ന৭Խ (SGW)

    • ෳ਺ͷਖ਼ྫΛ༻͍ͨରরֶश
    ·ͱΊ
    31

    View Slide

  32. •ରরֶशͱന৭ԽॲཧΛ૊Έ߹Θͤͨ

    จຒΊࠐΈֶशख๏ΛఏҊ
    • ຒΊࠐΈΛάϧʔϓʹ෼ׂͯ͠ന৭Խ (SGW)

    • ෳ਺ͷਖ਼ྫΛ༻͍ͨରরֶश

    ײ૝
    •ͿͬͪΌ͚ධՁ͕গ͠ո͍͠

    •άϧʔϓԽͤͣʹന৭Խͨ͠৔߹ͷੑೳ͕݁ߏ௿͍

    • ಛ௃දݱ͕drasticʹมԽ͗͢͠ΔͨΊʁ

    • άϧʔϓ͕খ͍͞ͷͰϚΠϧυͳന৭ԽΛ͍ͯ͠Δ

    •ϚΠϧυന৭ԽͰྑ͍ͳΒന৭Խૢ࡞ࣗମෆཁ͔΋ʁ

    • ෼ࢄڞ෼ࢄߦྻ͕IdenticalʹͳΔΑ͏ͳଛࣦ͸ʁ
    ·ͱΊ
    32
    ന৭Խͷ࢓ํ͕

    ؾʹͳΓ͗͢Δ

    View Slide