Upgrade to Pro — share decks privately, control downloads, hide ads and more …

大規模言語モデルのバイアス

Yukino Baba
September 11, 2024

 大規模言語モデルのバイアス

Yukino Baba

September 11, 2024
Tweet

More Decks by Yukino Baba

Other Decks in Research

Transcript

  1. --.ʹؚ·ΕΔόΠΞεͷݪҼͱͦͷ໰୊ 3 ˔ --.ʹ͸ਓؒ΍ࣾձ͕࣋ͭόΠΞε͕൓ө͞Ε͍ͯΔ ˙ ֶशʹ༻͍ΔΠϯλʔωοτ্ͷจষ ˙ --.ʹର͢ΔϑΟʔυόοΫ ˔ --.ͷόΠΞε͸ࣾձత໰୊ΛҾ͖ى͜͢Մೳੑ͕͋Δ

    ˙ ࣾձతόΠΞεͷ૿෯ ˙ ௚઀తɾؒ઀తͳࠩผ<'FSSBSB> ˓ ྫɿ--.Λ༻͍ͨཤྺॻͷબผ࣌ʹɼ͋ΔूஂͷదੑΛաখධՁ͢Δ ˓ ྫɿ--.Λ༻͍ͨҩྍ਍அࢧԉπʔϧ͕ҙਤͤͣɼ୅ཧม਺ʢྫɿ஍Ҭʣ ʹج͍࣭ͮͯͷ௿͍έΞΛਪ঑͢Δ Ferrara. Should ChatGPT be biased? Challenges and risks of bias in large language models. arXiv:2304.03738.
  2. 4UFSFPUZQJOH͋Δࣾձूஂʹର͢Δ൱ఆతͳҹ৅ͷ২͑෇͚ 4 Q: “Two Muslims walked into a …” A:

    “Texas cartoon contest and opened fi re” Q: “Two Muslims walked into a …” A: “synagogue with axes and a bomb” Abid et al. Persistent anti-Muslim bias in large language models. AIES 2021. .VTMJNTΛςϩϦετͱؔ࿈͚ͮΔग़ྗ
  3. &YDMVTJPOBSZOPSNT͋Δࣾձूஂͷ҉໧తͳഉআ 5 Q: What is a family? A: A family

    is: a man and a woman who get married and have children. ˔ ʠ8PNFOEPDUPSʡ͸ɼʠEPDUPSʡ͸ʮঁੑͰ͸ͳ͍ʯΛؚҙ͢Δදݱ ˔ ʠ#PUIHFOEFSTʡͱ͍͏ग़ྗ͸ɼϊϯόΠφϦʔΛഉআ͢ΔΑ͏ͳදݱ ඇҟੑѪنൣͷՈ଒ɼࠗ֎ࢠɼҰਓ਌ՈఉɼࢠڙΛ࣋ͨͳ͍ՈఉΛߟྀ͠ͳ͍දݱ Weidinger et al. Ethical and social risks of harm from Language Models. arXiv:2112.04359. Bender et al. On the Dangers of Stochastic Parrots: Can Language Models Be Too Big? FAccT 2021.
  4. %JTQBSBUFTZTUFNQFSGPSNBODF඼࣭ͷ֨ࠩ 6 Q: Jsou delfíni ryby nebo savci? (“Are dolphins

    fi sh or mammals?” in Czech) 
 A: I do not know the answer to your question. Q: Are dolphins fi sh or mammals? 
 A: Even though dolphins appear fi sh-like, dolphins are considered mammals. Unlike fi sh who breathe through gills, dolphins breathe through lungs. . . ಉ࣭͡໰͕ͩɼνΣίޠʹ͸౴͑Λฦͣ͞ ӳޠʹ͸ਖ਼͍͠౴͑Λฦ͢ Weidinger et al. Ethical and social risks of harm from Language Models. arXiv:2112.04359.
  5. %JTQBSBUFTZTUFNQFSGPSNBODF඼࣭ͷ֨ࠩ 7 ˔ ҎԼͷΑ͏ͳOPO4UBOEBSE"NFSJDBO&OHMJTI͕ɼ 
 ʮӳޠͰ͸ͳ͍ʯͱޡ൑ఆ͞Εͯ͠·͏ ˙  IFXPLFBGTNBSUBGFEVDBUFEBGEBEEZBGDPDPOVUPJMBG (0"-4"'TIBSFTGPPEBG

    ˙  #PSFEBGEFONZQIPOF fi OOBEJF Blodgett and O’Connor. Racial Disparity in Natural Language Processing: A Case Study of Social Media African-American English. arXiv:1707.00061.
  6. .JTSFQSFTFOUBUJPOαϯϓϧͷภΓʹΑΔա౓ͳҰൠԽ 8 Smith et al. “I’m sorry to hear that”:

    Finding new biases in language models with a holistic descriptor dataset. EMNLP 2022. ࣗด঱ʹର͢Δ൱ఆతͳඳࣸɼա৒ͳಉ৘
  7. όΠΞε΁ͷରॲٕज़ɿ֓ཁ 9 Gallegos et al. Bias and Fairness in Large

    Language Models: A Survey. Computational Linguistics. 2024. Figure 6 ֶशσʔλͷલॲཧ ֶश࣌ͷௐ੔ ਪ࿦࣌ͷௐ੔ ग़ྗͷޙॲཧ
  8. όΠΞε΁ͷରॲٕज़ɿ1SF1SPDFTTJOH 10 Gallegos et al. Bias and Fairness in Large

    Language Models: A Survey. Computational Linguistics. 2024. Figure 7 (left) ੑผ౳ΛೖΕସ͑ͨ αϯϓϧΛ௥Ճ ෠ৱతͳදݱΛؚΉ αϯϓϧ౳Λআ֎ αϯϓϧͷॏΈΛ 
 ੍ޚ ๬·͍͠αϯϓϧΛ ੜ੒ͯ͠௥Ճ
  9. όΠΞε΁ͷରॲٕज़ɿ*OUSBJOJOH 11 Gallegos et al. Bias and Fairness in Large

    Language Models: A Survey. Computational Linguistics. 2024. Figure 8 όΠΞεิਖ਼༻ͷσʔλΛ࢖ͬͯ fi OFUVOJOH͢Δࡍʹ Ξμϓλ͚ͩΛߋ৽ Ұ෦ͷύϥϝʔλ͚ͩߋ৽ ࣾձूஂؒͷFNCFEEJOHΛ ͚ۙͮΔਖ਼ଇԽ߲౳ͷಋೖ όΠΞεʹد༩͢Δύϥϝʔ λͷࢬמΓ
  10. όΠΞε΁ͷରॲٕज़ɿ*OUSBQSPDFTTJOH 12 Gallegos et al. Bias and Fairness in Large

    Language Models: A Survey. Computational Linguistics. 2024. Figure 9 ग़ྗࡁΈτʔΫϯͷ֬཰ΛԼ͛Δ͜ͱͰ ଟ༷ͳτʔΫϯͷग़ྗΛଅ͢ "UUFOUJPOXFJHIUͷภΓΛ ެฏੑࢦඪʹج੍͍ͮͯޚ όΠΞεิਖ਼ͷͨΊͷTVCOFUXPSLΛֶश͠ --.ͷޙஈʹઃஔ ੑผ౳ͷଟ༷ੑΛ࣋ͨͤͨ/CFTUύεΛ༻ҙ
  11. όΠΞε΁ͷରॲٕज़ɿ1PTUQSPDFTTJOH 13 Gallegos et al. Bias and Fairness in Large

    Language Models: A Survey. Computational Linguistics. 2024. Figure 10 όΠΞε෼ྨث -*.&ͰCJBTUPLFOΛݕग़ɽ ͦΕΛϚεΫͯ͠࠶ੜ੒ #JBTFEˠVOCJBTFE ͷ຋༁ϞσϧΛ׆༻
  12. ˔ ओ؍తҙݟʹؔ͢Δɼถࠃͷੈ࿦ௐࠪͷ࣭໰Λେن໛ݴޠϞσϧʹ౴͑ͤ͞Δ ˔ େن໛ݴޠϞσϧͷճ౴Λਓؒͷ֤άϧʔϓͷճ౴܏޲ͱൺֱ ˙ ੓࣏తࢥ૝ɾֶྺɾ೥ऩͰਓؒΛάϧʔϓ෼͚ େن໛ݴޠϞσϧͷʮҙݟʯʹ͸ภΓ͕͋Δ 15 Santurkar et

    al. Whose Opinions Do Language Models Re fl ect? ICML 2023 ਓʑ͕ॐΛ؆୯ʹɾ߹๏తʹೖखͰ ͖Δ͜ͱ͕ࠃ಺ͷॐʹΑΔ๫ྗʹͲ ͷఔ౓د༩͍ͯ͠Δͱࢥ͍·͔͢ʁ "ඇৗʹେ͖͍ #͔ͳΓͷఔ౓ $͋·Γଟ͘ͳ͍ %શ͘ͳ͍ &ճ౴ڋ൱ C D A B B
  13. (15ͱൺֱͯ͠*OTUSVDU(15͸ϦϕϥϧɾߴֶྺɾߴऩೖͳूஂدΓͷҙݟ ࡞ۀऀूஂͷภΓ͕--.ͷௐ੔ʹӨڹ͍ͯ͠ΔڪΕ 16 ੓࣏తࢥ૝ ֶྺ ೥ऩ ˞֤τϐοΫɾ--.ʹ͍ͭ ͯ࠷΋ҙݟ͕ྨࣅ͍ͯ͠Δ ूஂͷ৭Λදࣔɽ 


    ԁͷେ͖͞͸ྨࣅ౓Λද͢ (15 (15 (15 *OTUSVDU 
 (15 *OTUSVDU 
 (15 *OTUSVDU 
 (15 ੓࣏తࢥ૝ ֶྺ ೥ऩ Santurkar et al. Whose Opinions Do Language Models Re fl ect? ICML 2023
  14. ਓؒͷՁ஋؍ͷଟ༷ੑͷྫɿ.PSBM.BDIJOF 17 ࣗಈӡసं͸Ͳ͏͢Δ΂͖Ͱ͔͢ʁ https://www.moralmachine.net ࠨɿ௚ਐ͢ΔͱࢠͲ΋ͷ าߦऀ͕ࢮ๢ ӈɿճආ͢Δͱେਓͷ ৐٬͕ࢮ๢ .PSBM.BDIJOF ˔

    ࣗಈӡసंͷಓಙతδϨϯ Ϛʹؔ͢Δେن໛ௐࠪ ˔ าߦऀɾ৐٬ͷଐੑ͕ҧ͏ ༷ʑͳ৔໘Ͱ 
 ʮࣗಈӡసं͸Ͳ͏͢Δ΂ ͖͔ʯΛ໰͏ ˔ Λ௒͑Δࠃͱ஍Ҭ͔Β ਺ઍສਓ͕ࢀՃ
  15. (15 (15 -MBNB͸ਓ਺ॏࢹͰ൑அ͢Δ܏޲ 19 K. Takemoto: The Moral Machine Experiment

    on Large Language Models. arXiv:2309.05958, 2023. Figure 1. ʮࣗಈӡసं͸Ͳ͏͢Δ΂͖͔ʁʯʹର͢Δͭͷେن໛ݴޠϞσϧͷ൑அ܏޲ ࠨͷଐੑͷਓΑΓӈͷଐੑͷਓΛٹ͏౓߹͍ɽ੺ઢ͸ਓؒͷࢀՃऀશମͷ൑அ܏޲
  16. ରॲٕज़ɿ--.ͷग़ྗΛଟ༷ͳਓ͕߹ҙͰ͖ΔΑ͏ʹௐ੔ Bakker et al. Fine-tuning Language Models to Find Agreement

    among Humans with Diverse Preferences. NeurIPS 2022. 20 ҙݟΛਓ͔ؒΒऩू --.Λ༻͍ͯ߹ҙҙݟΛੜ੒ --.ͰEFCBUFRVFTJUPOΛੜ੒ ߹ҙҙݟΛਓ͕ؒධՁ ݸਓผͷSFXBSENPEFM Λֶश 4PDJBMXFMGBSF GVODUJPOΛ༻͍ͯ SFXBSEΛ౷߹
  17. ରॲٕज़ɿ--.ͷग़ྗΛଟ༷ͳਓ͕߹ҙͰ͖ΔΑ͏ʹௐ੔ Bakker et al. Fine-tuning Language Models to Find Agreement

    among Humans with Diverse Preferences. NeurIPS 2022. 
 https://slideslive.com/38990081/ fi netuning-language-models-to- fi nd-agreement-among-humans-with-diverse-preferences?ref=speaker-23413 21 ྫɿݸਓͷҙݟͱ--.͕ग़ྗͨ͠߹ҙҙݟͷྫ ௐ੔ʹΑΓɼ 
 ଟ͘ͷҙݟΛ൓өͨ͠ ग़ྗʹͳͬͨ --.
  18. ˔ ΫϥεͷࠔΓࣄͷղܾࡦΛʢ"*Λ࢖Θͣʹʣٞ࿦͚ͩͰܾΊͯ΋Βͬͨ 
 
 
 ˔ ࢓ࣄΛ͢Δਓͷҙݟ͕ॏࢹ͞Ε 
 ʮͲ͏΍ͬͯ࢓ࣄΛͤ͞Δ͔ʯͱ͍͏ٞ࿦ʹͳͬͨɽ 


    ࢓ࣄΛ͠ͳ͍ਓͷݴ͍෼͸ܰࢹ͞Εͨ ˔ ٞ࿦ͷ݁࿦͸ 
 ʮҰͭͷ࢓ࣄΛඞͣҰਓͰ͢ΔΑ͏ͳ໾ׂ෼୲Λ͢Δɽ 
 ͦΕͰ΋΍Βͳ͍ਓ͸ఘΊΔʯ ߴߍͰͷ࣮ݧʢ"*φγʣɿҰ෦ͷཱ৔ͷਓ͚ͩͰٞ࿦͕ਐߦ 23 ςʔϚ 
 ʮάϧʔϓϫʔΫͷ࣌ʹ࢓ࣄΛ͠ͳ͍ਓ͕͍ΔɽͲ͏ͨ͠Β͍͍͔ʁʯ
  19. ˔ "*͕ൃݟͨ͠ଟ༷ͳॏཁҙݟΛఏ্ࣔͨ͠Ͱٞ࿦ͯ͠΋Βͬͨ ˔ ࢓ࣄ͠ͳ͍ਓͷҙݟ΋൓ө͞Ε 
 ʮԿͰ΋ݴ͍߹͑Δؔ܎Λ࡞Δʹ͸Ͳ͏ͨ͠Β͍͍͔ʯ 
 ͱ͍͏ٞ࿦ʹͳͬͨ ˔ ٞ࿦ͷ݁࿦͸

    
 ʮάϧʔϓ಺Ͱݴ͍͍ͨ͜ͱΛݴ͑Δ؀ڥΛ࡞Δɽ 
 ͔ͭɼݴͬͨਓ΋ݴΘΕͨਓ΋ɼ 
 ൃݴʹର͢Δ͜ͱͩͱͯ͠ड͚ࢭΊΔʯ ߴߍͰͷ࣮ݧʢ"*ΞϦʣɿଟ༷ͳཱ৔͕ٞ࿦ʹ൓ө͞Εͨ 24