Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
NLP2025参加報告
Search
Yano
April 11, 2025
0
550
NLP2025参加報告
こちらのNLP振り返りイベントにおけるLTで使用したスライドです(
https://moneyforward.connpass.com/event/344276/
)
Yano
April 11, 2025
Tweet
Share
More Decks by Yano
See All by Yano
【輪講資料】Length-Induced Embedding Collapse in PLM-based Models
yano0
0
130
【輪講資料】How Do Large Language Models Acquire Factual Knowledge During Pretraining?
yano0
0
210
【輪講資料】ReAct: Synergizing Reasoning and Acting in Language Models / Tree of Thoughts: Deliberate Problem Solving with Large Language Models
yano0
0
180
【輪講資料】SimLM: Pre-training with Representation Bottleneck for Dense Passage Retrieval
yano0
2
340
【輪講資料】From Zero to Hero: On the Limitations of Zero-Shot Language Transfer with Multilingual Transformers
yano0
0
91
【輪講資料】Zero-shot Cross-lingual Semantic Parsing
yano0
0
130
Featured
See All Featured
We Have a Design System, Now What?
morganepeng
53
7.8k
Building Adaptive Systems
keathley
43
2.8k
Learning to Love Humans: Emotional Interface Design
aarron
274
40k
Build The Right Thing And Hit Your Dates
maggiecrowley
37
2.9k
Building Applications with DynamoDB
mza
96
6.6k
Making Projects Easy
brettharned
119
6.4k
Typedesign – Prime Four
hannesfritz
42
2.8k
Designing Experiences People Love
moore
142
24k
Distributed Sagas: A Protocol for Coordinating Microservices
caitiem20
333
22k
The Power of CSS Pseudo Elements
geoffreycrofte
79
6k
Mobile First: as difficult as doing things right
swwweet
224
10k
Navigating Team Friction
lara
189
15k
Transcript
NLPௌߨࢀՃใࠂ 4݄11 ໊େ ݚڀࣨɹD1 ઍߛ
ࣗݾհ ઍߛʢͷ ͪͻΖʣ • ܦྺɿ໊େ ాɾݚʢम࢜՝ఔʣˠPKSHAʢػցֶशΤϯδχΞʣ →໊େ ݚʢത࢜՝ఔʣ •
ڵຯؔ৺ɿҙຯɺຒΊࠐΈදݱ • ࠷ۙςΩετຒΊࠐΈָ͕͍͠ • ݕࡧಛԽϞσϧɺͥͻ͍ͬͯͩ͘͞ɿ ɹpkshatech/GLuCoSE-base-ja-v2 2 9ZBOP@D
ςΩετຒΊࠐΈϞσϧͱ • ࣗવݴޠจ·ͨจষΛܭࢉػ͕ཧղՄೳͳදݱʢҰൠʹϕΫτ ϧʣʹΤϯίʔυ͢Δͷ • ϕΫτϧؒͷྨࣅΛଌΔ͜ͱͰɺྨࣅΛଌΔ͜ͱ͕Ͱ͖Δ 3 ࣍ճNLPͷ։࠵ʁ ࢜ࢁຊ࠷ߴๆͷಠཱๆͰ͢ɻ ࢁསݝͱ੩Ԭݝʹލ͍ͬͯ·͢ɻ
ຊͰҰ൪ߴ͍ࢁʁ ྨࣅɿ ྨࣅɿߴ Ϟσϧ ʜ ʜ ʜ Ϟσϧ Ϟσϧ 2"λεΫͰͷఆڍಈ
ࠓճͷNLP • ϓϩάϥϜ͔Β”ຒΊࠐΈදݱ”ςʔϚͷফࣦ 😢 • ຒΊࠐΈʹؔ࿈͢Δൃද͕ݮ͍ͬͯΔͷͰʁ • “ຒΊࠐΈ”͕λΠτϧʹೖͬͨൃදɿ17/499݅ˠ26/777݅ 😊 •
ʢׂ߹ʹ͢Δͱ΄ͱΜͲҰఆʣ • “ςΩετ” or ”จ” + “ຒΊࠐΈ”͕λΠτϧʹೖͬͨൃදɿ6݅ˠ6݅ 😊 • “ςΩετຒΊࠐΈ”͕λΠτϧʹೖͬͨൃද0 -> 5݅ ※ ͋͘·Ͱදʹجͮ͘౷ܭ 4
”ຒΊࠐΈ”ΛλΠτϧʹؚΉൃදҰཡ 5 ΨεաఔʹΑΔຒΊࠐΈू߹ͷ࣌ؒભҠͷϞσϧԽ ຒΊࠐΈදݱͷಠཱͷݴޠɾݴޠؒҰ؏ੑͷੳ ຒΊࠐΈϕΫτϧΛ༻͍ͨಈࢺͷҙຯͷཻੳͱڞىؔ Lۙࣄྫʹجͮ͘ຒΊࠐΈදݱͷυϝΠϯదԠͱݕࡧͷԠ༻ ຒΊࠐΈදݱͷࡏ࣍ݩΛଌΔ ՎͷຒΊࠐΈʹجͮ͘ຊՎऔΓͷਪఆ 3VSJຊޠʹಛԽͨ͠൚༻ςΩετຒΊࠐΈϞσϧ จͷຒΊࠐΈʹޮՌతͳ੩త୯ޠϕΫτϧͷ֫ಘ
ରཤྺͷ--.ຒΊࠐΈΛ༻͍ͨԻ߹ͷελΠϧ੍ޚ ܇࿅ෆཁͳ͖݅ςΩετຒΊࠐΈ ϓϩϯϓτʹجͮ͘ςΩετຒΊࠐΈͷλεΫʹΑΔੑͷҧ͍ খઆձจͷ༁͚ͨٯ༁Λ༻͍ͨऀຒΊࠐΈͷ࡞ ୯ޠຒΊࠐΈͷಠཱੳͷ͕࣠ղऍͰ͖ΔཻͲΕ͘Β͍͔ʁ Ϣʔβߦಈϩάʹجͮ͘ΫΤϦཧղͷͨΊͷݕࡧΫΤϦຒΊࠐΈ ςΩετͷຒΊࠐΈදݱʹجͮ͘σʔλ૿ڧΛ༻͍ͨ 9ʢچ5XJUUFSʣʹ͓͚Δຊޠͷൽݕग़ ςΩετຒΊࠐΈ͔ΒͷςΩετ෮ݩʹ͓͚Δ ༧ଌ੍ޚͷԉ༻ͷޮՌݕূ จֶ൷ධ͔ΒେنݴޠϞσϧ ʕ୯ޠຒΊࠐΈͷΈ͑ʹΑΔจֶςΫετղऍͷࢼΈ --.ຒΊࠐΈͱભҠ֬༧ଌΛར༻ͨ͠ ࣮ళฮސ٬ߦಈγϛϡϨʔγϣϯ ಠཱੳʹΑΔࣄલֶशࡁΈଟݴޠϞσϧͷ Λԣஅͨ͠୯ޠຒΊࠐΈදݱͷੳ දهΏΕ͕จຒΊࠐΈϞσϧʹٴ΅͢Өڹʹ͍ͭͯͷߟ -BSHF7JTJPO-BOHVBHF.PEFMͷ จॻը૾ςΩετຒΊࠐΈͷݕূ จॻຒΊࠐΈͱΫϥελϦϯάΛΈ߹Θͤͨ τϐοΫੳख๏ͷఏҊ --.ࣄલֶशͷޮԽͱੑ࣭վળɿ ຒΊࠐΈ͓Αͼग़ྗͷύϥϝʔλݻఆʹΑΔ࠶ར༻ ຒΊࠐΈϞσϧϕʔεͷڭࢣͳ͠ΩʔϑϨʔζநग़ʹ͓͚Δ จʹର͢Δநग़ਫ਼ͷվળ ֦ࢄϞσϧΛ༻͍ͨςΩετੜʹ͓͚Δ ʮ่յʯͱ࣌ࠁຒΊࠐΈͷӨڹ దԠతରγεςϜͷͨΊͷ ऴ൫ͷձΛ༧ଌ͢ΔຒΊࠐΈϞσϧͷߏங ର༷ʑ👀
• ΞϒετϥΫτΛwordcloudʹͯ͠Έͨ ୯ޠςΩετͷຒ ΊࠐΈ͕ଟͦ͏ ൃදͷ 6 ੳ͕ϝΠϯͷݚڀ ଟͦ͏ ୯ޠςΩετͷຒ ΊࠐΈ͕ଟͦ͏
͍͔ͭ͘հ • ϞσϧΛ܇࿅͍ͯ͠Δจ • Ruri: ຊޠʹಛԽͨ͠൚༻ςΩετຒΊࠐΈϞσϧ [௩ӽΒ] • ΠϯετϥΫγϣϯͱෳλεΫΛར༻ͨ͠ຊޠ͚ࢄදݱϞσϧͷߏ ங
[উຢΒ] • Ϣʔβߦಈϩάʹجͮ͘ΫΤϦཧղͷͨΊͷݕࡧΫΤϦຒΊࠐΈ [Β] • ຒΊࠐΈΛੳ͍ͯ͠Δจ • ಠཱੳʹΑΔࣄલֶशࡁΈଟݴޠϞσϧͷΛԣஅͨ͠୯ޠຒΊࠐΈ දݱͷੳ [Β] • ϓϩϯϓτʹجͮ͘ςΩετຒΊࠐΈͷλεΫʹΑΔੑͷҧ͍ [௩ӽΒ] 7 ˞˞ͱͯओ؍ͰબΜͰ͍·͢˞˞ଞʹ͜Μͳ͓Ζ͍จ͋ͬͨΑʂͳͲͷίϝϯτܴͰ͢ʂ
Ruri: ຊޠʹಛԽͨ͠൚༻ςΩετຒΊࠐΈϞσϧ • ຊޠ൚༻ςΩετຒΊࠐΈϞσϧɺRuriͷ։ൃɺެ։ • ϞσϧαΠζෳʢsmallɺbaseɺlargeʣ • ్தͷϞσϧͰ͋ΔRuri-PTɺRuri-Rerankerެ։ • ܇࿅༻σʔληοτͷඋ
• ਓσʔληοτΛ࡞ • ෳͷެ։σʔλΛಉҰϑΥʔϚοτͰඋ 8 ຊޠBERT Ruri-PT Ruri-Reranker Ruri ରরࣄલֶश 'JOF5VOJOH ৠཹ ͜͜ʹ͔ͳΓશ͕ͯ͋Γ ͋Γ͕͍ͨ Πϝʔδ จհɿϞσϧߏஙܥ
ΠϯετϥΫγϣϯͱෳλεΫΛར༻ͨ͠ຊޠ͚ࢄදݱϞσϧͷߏங • ෳλεΫɾݴޠͷ܇࿅σʔλͰͷ܇࿅͕ɺJMTEB (=แׅతͳςΩετ ຒΊࠐΈϞσϧͷධՁࢦඪ) ʹରͯ͠༩͑ΔӨڹΛੳ • ӳࠞ߹ͯ͠܇࿅ͨ͠ํ͕ɺຊޠͷΈͰ Ͱ܇࿅͢ΔΑΓߴ͍ੑೳ •
ධՁλεΫʹΑͬͯ༗ޮͳ܇࿅λεΫҟͳΔ • ӳࠞ߹σʔλͰ܇࿅͠ߏஙͨ͠ϞσϧΛެ։ • retrieva-jp/amber-base, retrieva-jp/amber-large 9 จհɿϞσϧߏஙܥ ܇࿅͔Βআ͘λεΫ ධՁλεΫͷੑೳมԽ ྫ) NLIͰͷ܇࿅ɿSTSੑೳ⤴ ΫϥελϦϯάੑೳ⤵ ˝ਤΑΓҾ༻
Ϣʔβߦಈϩάʹجͮ͘ΫΤϦཧղͷͨΊͷݕࡧΫΤϦຒΊࠐΈ • ݕࡧΫΤϦ௨ৗͷςΩετຒΊࠐΈͰରͱ͞ΕΔࣗવจͱൺֱͯ͘͠ จ຺͕͍ܽͯ͠Δʢྫɿ࡚ͷདྷिͷఱؾԿʁʣ • ϢʔβʔͷߦಈϩάΛར༻ͯ͠ྨࣅҙਤΛ࣋ͭΫΤϦϖΞΛநग़͠܇࿅ʹར༻͢ Δख๏ɺUBIQUEΛఏҊ • ΫϦοΫϩάɿݕࡧ݁Ռͷಉ͡URLΛΫϦοΫ ͨ͠ΫΤϦ
• ηογϣϯϩάɿಉ͡ηογϣϯͰҰఆ࣌ؒʹ ೖྗ͞ΕͨΫΤϦ • → දʹΑΒͣಉ͡ҙਤͷΫΤϦ͕நग़͞ΕΔ • ಛʹදมԽʹؤ݈ͳϞσϧΛߏங 10 จհɿϞσϧߏஙܥ ˝ਤΑΓҾ༻
ϓϩϯϓτʹجͮ͘ςΩετຒΊࠐΈͷλεΫʹΑΔੑͷҧ͍ • ϓϩϯϓτʹجͮ͘ςΩετຒΊࠐΈɿ • λεΫݻ༗ͷΠϯετϥΫγϣϯΛ༩ͯ͠ຒΊࠐΈΛ࡞Δ • λεΫʹґଘͨ͠ੑΛ࣋ͭ͜ͱΛࣔͨ͠ 11 จհɿੳܥ 4096࣍ݩΛ16࣍ݩ·Ͱ࣍ݩݮ
ͯ͠ੑೳྼԽͳ͍ 512࣍ݩ͙Β͍·Ͱ ྼԽগͳ͍ Ҏ߱ੑೳྼԽ͕ݦஶ ྨλεΫ ݕࡧλεΫ ˝ਤΑΓҾ༻ ˝ਤΑΓҾ༻
ಠཱੳʹΑΔࣄલֶशࡁΈଟݴޠϞσϧͷΛԣஅͨ͠୯ޠຒΊࠐΈදݱͷੳ • ଟݴޠϞσϧ͕͝ͱʹ࣋ͭಛΛ ಠཱੳʢICAʣʹΑ֤ͬͯ࣠ʹ • ग़ྗʹ͍ۙ΄ͲҙຯʹΑ͕ͬͯ࣠ ͢Δ͜ͱΛ໌Β͔ʹͨ͠ • 1ɿ ɹɹɹ࣠දܥʹΑͬͯ
• 712ɿ ɹɹɹ࣠ҙຯʹΑͬͯ 12 จհɿੳܥ ˝ਤΑΓҾ༻
ࢀՃใࠂతͳ༰ • จͷհ͔ͬͯ͠͠·ͬͨͷͰ… • 3ճͷNLPࢀՃͰײͨ͜͡ͱ • ࠃݚڀք۾ͰͷϗοτͳςʔϚ͕໌ʹͳͬͯษڧʹͳΔ • ྫ͑ࣗϞσϧղੳपΓʹ͍ͭͯશʹӜౡଠͩͬͨ •
ϙελʔ͔ͬΓݟ͍ͯΔͷΛࣙΊ͍ͨ • ͱΓ͋͑ͣͰϙελʔձʹߦͬͯ͠·͏͜ͱ͕ଟ͔͕ͬͨɺޱड़ ʹ໘നͦ͏ͳൃද͋ͬͨͳ…ͱؼޙʹޙչ • ࣗͷֶͼ͕͕Δͱͱʹɺݟ͕ͬͨ૿ָ͍͑ͯ͠ 13
࠷ޙʹ • օ͞Μͷ͓͢͢ΊNLPจɺͥͻڭ͍͑ͯͩ͘͞🥺 • ͠Β͘౦ژʹ͍ΔͷͰɺͳΜͰ༠͍ͬͯͩ͘͞ʂ • ʢ໘നͦ͏ͳΠϯλʔϯͳͲɺڭ͍͑ͯͩ͘͞ʣ 14