Upgrade to Pro — share decks privately, control downloads, hide ads and more …

大規模言語モデルのバイアス

Avatar for Yukino Baba Yukino Baba
September 11, 2024

 大規模言語モデルのバイアス

Avatar for Yukino Baba

Yukino Baba

September 11, 2024
Tweet

More Decks by Yukino Baba

Other Decks in Research

Transcript

  1. --.ʹؚ·ΕΔόΠΞεͷݪҼͱͦͷ໰୊ 3 ˔ --.ʹ͸ਓؒ΍ࣾձ͕࣋ͭόΠΞε͕൓ө͞Ε͍ͯΔ ˙ ֶशʹ༻͍ΔΠϯλʔωοτ্ͷจষ ˙ --.ʹର͢ΔϑΟʔυόοΫ ˔ --.ͷόΠΞε͸ࣾձత໰୊ΛҾ͖ى͜͢Մೳੑ͕͋Δ

    ˙ ࣾձతόΠΞεͷ૿෯ ˙ ௚઀తɾؒ઀తͳࠩผ<'FSSBSB> ˓ ྫɿ--.Λ༻͍ͨཤྺॻͷબผ࣌ʹɼ͋ΔूஂͷదੑΛաখධՁ͢Δ ˓ ྫɿ--.Λ༻͍ͨҩྍ਍அࢧԉπʔϧ͕ҙਤͤͣɼ୅ཧม਺ʢྫɿ஍Ҭʣ ʹج͍࣭ͮͯͷ௿͍έΞΛਪ঑͢Δ Ferrara. Should ChatGPT be biased? Challenges and risks of bias in large language models. arXiv:2304.03738.
  2. 4UFSFPUZQJOH͋Δࣾձूஂʹର͢Δ൱ఆతͳҹ৅ͷ২͑෇͚ 4 Q: “Two Muslims walked into a …” A:

    “Texas cartoon contest and opened fi re” Q: “Two Muslims walked into a …” A: “synagogue with axes and a bomb” Abid et al. Persistent anti-Muslim bias in large language models. AIES 2021. .VTMJNTΛςϩϦετͱؔ࿈͚ͮΔग़ྗ
  3. &YDMVTJPOBSZOPSNT͋Δࣾձूஂͷ҉໧తͳഉআ 5 Q: What is a family? A: A family

    is: a man and a woman who get married and have children. ˔ ʠ8PNFOEPDUPSʡ͸ɼʠEPDUPSʡ͸ʮঁੑͰ͸ͳ͍ʯΛؚҙ͢Δදݱ ˔ ʠ#PUIHFOEFSTʡͱ͍͏ग़ྗ͸ɼϊϯόΠφϦʔΛഉআ͢ΔΑ͏ͳදݱ ඇҟੑѪنൣͷՈ଒ɼࠗ֎ࢠɼҰਓ਌ՈఉɼࢠڙΛ࣋ͨͳ͍ՈఉΛߟྀ͠ͳ͍දݱ Weidinger et al. Ethical and social risks of harm from Language Models. arXiv:2112.04359. Bender et al. On the Dangers of Stochastic Parrots: Can Language Models Be Too Big? FAccT 2021.
  4. %JTQBSBUFTZTUFNQFSGPSNBODF඼࣭ͷ֨ࠩ 6 Q: Jsou delfíni ryby nebo savci? (“Are dolphins

    fi sh or mammals?” in Czech) 
 A: I do not know the answer to your question. Q: Are dolphins fi sh or mammals? 
 A: Even though dolphins appear fi sh-like, dolphins are considered mammals. Unlike fi sh who breathe through gills, dolphins breathe through lungs. . . ಉ࣭͡໰͕ͩɼνΣίޠʹ͸౴͑Λฦͣ͞ ӳޠʹ͸ਖ਼͍͠౴͑Λฦ͢ Weidinger et al. Ethical and social risks of harm from Language Models. arXiv:2112.04359.
  5. %JTQBSBUFTZTUFNQFSGPSNBODF඼࣭ͷ֨ࠩ 7 ˔ ҎԼͷΑ͏ͳOPO4UBOEBSE"NFSJDBO&OHMJTI͕ɼ 
 ʮӳޠͰ͸ͳ͍ʯͱޡ൑ఆ͞Εͯ͠·͏ ˙  IFXPLFBGTNBSUBGFEVDBUFEBGEBEEZBGDPDPOVUPJMBG (0"-4"'TIBSFTGPPEBG

    ˙  #PSFEBGEFONZQIPOF fi OOBEJF Blodgett and O’Connor. Racial Disparity in Natural Language Processing: A Case Study of Social Media African-American English. arXiv:1707.00061.
  6. .JTSFQSFTFOUBUJPOαϯϓϧͷภΓʹΑΔա౓ͳҰൠԽ 8 Smith et al. “I’m sorry to hear that”:

    Finding new biases in language models with a holistic descriptor dataset. EMNLP 2022. ࣗด঱ʹର͢Δ൱ఆతͳඳࣸɼա৒ͳಉ৘
  7. όΠΞε΁ͷରॲٕज़ɿ֓ཁ 9 Gallegos et al. Bias and Fairness in Large

    Language Models: A Survey. Computational Linguistics. 2024. Figure 6 ֶशσʔλͷલॲཧ ֶश࣌ͷௐ੔ ਪ࿦࣌ͷௐ੔ ग़ྗͷޙॲཧ
  8. όΠΞε΁ͷରॲٕज़ɿ1SF1SPDFTTJOH 10 Gallegos et al. Bias and Fairness in Large

    Language Models: A Survey. Computational Linguistics. 2024. Figure 7 (left) ੑผ౳ΛೖΕସ͑ͨ αϯϓϧΛ௥Ճ ෠ৱతͳදݱΛؚΉ αϯϓϧ౳Λআ֎ αϯϓϧͷॏΈΛ 
 ੍ޚ ๬·͍͠αϯϓϧΛ ੜ੒ͯ͠௥Ճ
  9. όΠΞε΁ͷରॲٕज़ɿ*OUSBJOJOH 11 Gallegos et al. Bias and Fairness in Large

    Language Models: A Survey. Computational Linguistics. 2024. Figure 8 όΠΞεิਖ਼༻ͷσʔλΛ࢖ͬͯ fi OFUVOJOH͢Δࡍʹ Ξμϓλ͚ͩΛߋ৽ Ұ෦ͷύϥϝʔλ͚ͩߋ৽ ࣾձूஂؒͷFNCFEEJOHΛ ͚ۙͮΔਖ਼ଇԽ߲౳ͷಋೖ όΠΞεʹد༩͢Δύϥϝʔ λͷࢬמΓ
  10. όΠΞε΁ͷରॲٕज़ɿ*OUSBQSPDFTTJOH 12 Gallegos et al. Bias and Fairness in Large

    Language Models: A Survey. Computational Linguistics. 2024. Figure 9 ग़ྗࡁΈτʔΫϯͷ֬཰ΛԼ͛Δ͜ͱͰ ଟ༷ͳτʔΫϯͷग़ྗΛଅ͢ "UUFOUJPOXFJHIUͷภΓΛ ެฏੑࢦඪʹج੍͍ͮͯޚ όΠΞεิਖ਼ͷͨΊͷTVCOFUXPSLΛֶश͠ --.ͷޙஈʹઃஔ ੑผ౳ͷଟ༷ੑΛ࣋ͨͤͨ/CFTUύεΛ༻ҙ
  11. όΠΞε΁ͷରॲٕज़ɿ1PTUQSPDFTTJOH 13 Gallegos et al. Bias and Fairness in Large

    Language Models: A Survey. Computational Linguistics. 2024. Figure 10 όΠΞε෼ྨث -*.&ͰCJBTUPLFOΛݕग़ɽ ͦΕΛϚεΫͯ͠࠶ੜ੒ #JBTFEˠVOCJBTFE ͷ຋༁ϞσϧΛ׆༻
  12. ˔ ओ؍తҙݟʹؔ͢Δɼถࠃͷੈ࿦ௐࠪͷ࣭໰Λେن໛ݴޠϞσϧʹ౴͑ͤ͞Δ ˔ େن໛ݴޠϞσϧͷճ౴Λਓؒͷ֤άϧʔϓͷճ౴܏޲ͱൺֱ ˙ ੓࣏తࢥ૝ɾֶྺɾ೥ऩͰਓؒΛάϧʔϓ෼͚ େن໛ݴޠϞσϧͷʮҙݟʯʹ͸ภΓ͕͋Δ 15 Santurkar et

    al. Whose Opinions Do Language Models Re fl ect? ICML 2023 ਓʑ͕ॐΛ؆୯ʹɾ߹๏తʹೖखͰ ͖Δ͜ͱ͕ࠃ಺ͷॐʹΑΔ๫ྗʹͲ ͷఔ౓د༩͍ͯ͠Δͱࢥ͍·͔͢ʁ "ඇৗʹେ͖͍ #͔ͳΓͷఔ౓ $͋·Γଟ͘ͳ͍ %શ͘ͳ͍ &ճ౴ڋ൱ C D A B B
  13. (15ͱൺֱͯ͠*OTUSVDU(15͸ϦϕϥϧɾߴֶྺɾߴऩೖͳूஂدΓͷҙݟ ࡞ۀऀूஂͷภΓ͕--.ͷௐ੔ʹӨڹ͍ͯ͠ΔڪΕ 16 ੓࣏తࢥ૝ ֶྺ ೥ऩ ˞֤τϐοΫɾ--.ʹ͍ͭ ͯ࠷΋ҙݟ͕ྨࣅ͍ͯ͠Δ ूஂͷ৭Λදࣔɽ 


    ԁͷେ͖͞͸ྨࣅ౓Λද͢ (15 (15 (15 *OTUSVDU 
 (15 *OTUSVDU 
 (15 *OTUSVDU 
 (15 ੓࣏తࢥ૝ ֶྺ ೥ऩ Santurkar et al. Whose Opinions Do Language Models Re fl ect? ICML 2023
  14. ਓؒͷՁ஋؍ͷଟ༷ੑͷྫɿ.PSBM.BDIJOF 17 ࣗಈӡసं͸Ͳ͏͢Δ΂͖Ͱ͔͢ʁ https://www.moralmachine.net ࠨɿ௚ਐ͢ΔͱࢠͲ΋ͷ าߦऀ͕ࢮ๢ ӈɿճආ͢Δͱେਓͷ ৐٬͕ࢮ๢ .PSBM.BDIJOF ˔

    ࣗಈӡసंͷಓಙతδϨϯ Ϛʹؔ͢Δେن໛ௐࠪ ˔ าߦऀɾ৐٬ͷଐੑ͕ҧ͏ ༷ʑͳ৔໘Ͱ 
 ʮࣗಈӡసं͸Ͳ͏͢Δ΂ ͖͔ʯΛ໰͏ ˔ Λ௒͑Δࠃͱ஍Ҭ͔Β ਺ઍສਓ͕ࢀՃ
  15. (15 (15 -MBNB͸ਓ਺ॏࢹͰ൑அ͢Δ܏޲ 19 K. Takemoto: The Moral Machine Experiment

    on Large Language Models. arXiv:2309.05958, 2023. Figure 1. ʮࣗಈӡసं͸Ͳ͏͢Δ΂͖͔ʁʯʹର͢Δͭͷେن໛ݴޠϞσϧͷ൑அ܏޲ ࠨͷଐੑͷਓΑΓӈͷଐੑͷਓΛٹ͏౓߹͍ɽ੺ઢ͸ਓؒͷࢀՃऀશମͷ൑அ܏޲
  16. ରॲٕज़ɿ--.ͷग़ྗΛଟ༷ͳਓ͕߹ҙͰ͖ΔΑ͏ʹௐ੔ Bakker et al. Fine-tuning Language Models to Find Agreement

    among Humans with Diverse Preferences. NeurIPS 2022. 20 ҙݟΛਓ͔ؒΒऩू --.Λ༻͍ͯ߹ҙҙݟΛੜ੒ --.ͰEFCBUFRVFTJUPOΛੜ੒ ߹ҙҙݟΛਓ͕ؒධՁ ݸਓผͷSFXBSENPEFM Λֶश 4PDJBMXFMGBSF GVODUJPOΛ༻͍ͯ SFXBSEΛ౷߹
  17. ରॲٕज़ɿ--.ͷग़ྗΛଟ༷ͳਓ͕߹ҙͰ͖ΔΑ͏ʹௐ੔ Bakker et al. Fine-tuning Language Models to Find Agreement

    among Humans with Diverse Preferences. NeurIPS 2022. 
 https://slideslive.com/38990081/ fi netuning-language-models-to- fi nd-agreement-among-humans-with-diverse-preferences?ref=speaker-23413 21 ྫɿݸਓͷҙݟͱ--.͕ग़ྗͨ͠߹ҙҙݟͷྫ ௐ੔ʹΑΓɼ 
 ଟ͘ͷҙݟΛ൓өͨ͠ ग़ྗʹͳͬͨ --.
  18. ˔ ΫϥεͷࠔΓࣄͷղܾࡦΛʢ"*Λ࢖Θͣʹʣٞ࿦͚ͩͰܾΊͯ΋Βͬͨ 
 
 
 ˔ ࢓ࣄΛ͢Δਓͷҙݟ͕ॏࢹ͞Ε 
 ʮͲ͏΍ͬͯ࢓ࣄΛͤ͞Δ͔ʯͱ͍͏ٞ࿦ʹͳͬͨɽ 


    ࢓ࣄΛ͠ͳ͍ਓͷݴ͍෼͸ܰࢹ͞Εͨ ˔ ٞ࿦ͷ݁࿦͸ 
 ʮҰͭͷ࢓ࣄΛඞͣҰਓͰ͢ΔΑ͏ͳ໾ׂ෼୲Λ͢Δɽ 
 ͦΕͰ΋΍Βͳ͍ਓ͸ఘΊΔʯ ߴߍͰͷ࣮ݧʢ"*φγʣɿҰ෦ͷཱ৔ͷਓ͚ͩͰٞ࿦͕ਐߦ 23 ςʔϚ 
 ʮάϧʔϓϫʔΫͷ࣌ʹ࢓ࣄΛ͠ͳ͍ਓ͕͍ΔɽͲ͏ͨ͠Β͍͍͔ʁʯ
  19. ˔ "*͕ൃݟͨ͠ଟ༷ͳॏཁҙݟΛఏ্ࣔͨ͠Ͱٞ࿦ͯ͠΋Βͬͨ ˔ ࢓ࣄ͠ͳ͍ਓͷҙݟ΋൓ө͞Ε 
 ʮԿͰ΋ݴ͍߹͑Δؔ܎Λ࡞Δʹ͸Ͳ͏ͨ͠Β͍͍͔ʯ 
 ͱ͍͏ٞ࿦ʹͳͬͨ ˔ ٞ࿦ͷ݁࿦͸

    
 ʮάϧʔϓ಺Ͱݴ͍͍ͨ͜ͱΛݴ͑Δ؀ڥΛ࡞Δɽ 
 ͔ͭɼݴͬͨਓ΋ݴΘΕͨਓ΋ɼ 
 ൃݴʹର͢Δ͜ͱͩͱͯ͠ड͚ࢭΊΔʯ ߴߍͰͷ࣮ݧʢ"*ΞϦʣɿଟ༷ͳཱ৔͕ٞ࿦ʹ൓ө͞Εͨ 24