Upgrade to Pro — share decks privately, control downloads, hide ads and more …

診断前の病歴テキストを対象としたLLMによるエンティティリンキング精度検証

 診断前の病歴テキストを対象としたLLMによるエンティティリンキング精度検証

2025年度人工知能学会全国大会(第39回)の口頭発表資料
キーワード:医療テキスト、LLM、NLP、固有表現抽出、Entity Linking

Avatar for Takashi Nishibayashi

Takashi Nishibayashi

May 29, 2025
Tweet

More Decks by Takashi Nishibayashi

Other Decks in Research

Transcript

  1. 5 എܠɿϝσΟΧϧΤϯςΟςΟϦϯΩϯά (Medical Entity Linking) എܠɾ໨త (1/3) • ҩྍςΩετʹؚ·ΕΔ࣬ױ΍ਓମͷ෦ҐɺༀࡎͳͲͷ֓೦Λ
 ԿΒ͔ͷҩֶΦϯτϩδʔʹରԠ෇͚ΔλεΫ

    • ͳͥඞཁ͔? • ςΩετ͔Βநग़ͨ͠৘ใΛطଘͷҩྍ৘ใγεςϜʹऔΓ͜Ήʹ͸
 γεςϜ͕࠾༻͍ͯ͠Δ஌ࣝදݱʹม׵͢Δඞཁ͕͋Δ • ྫ • ප໊ͷICD-10ίʔσΟϯά (2ܕ౶೘පੑ໢ບ঱ → E11.3)
  2. 7 എܠɿݱපྺςΩετ͔Βؑผ਍அʹඞཁͳ৘ใΛநग़͍ͨ͠ എܠɾ໨త (2/3) ͕ࣖશ͘ௌ͑͜ͳ͍Ͱ͢ɻ ೉ௌͷൃ঱࣌ظͱ
 ൃ঱༷͕ࣜෆ໌ • ਍அࢧԉγεςϜͷ༷ͳγεςϜ΁ͷೖྗΛߟ͑Δ •

    ঱ঢ়ͷ༗ແɺطԟྺɾՈ଒ྺɾࣾձྺͳͲ • ঱ঢ়ʹ͍ͭͯ͸͞Βʹ঱ঢ়ͷڧ͞ɺ͍ͭͲͷΑ͏ʹൃ঱͔ͨ͠ɺ࣋ଓ࣌ؒɺൃ ঱ͨ͠ঢ়گͳͲ͕ؑผ਍அʹॏཁ
  3. 8 എܠɿݱපྺςΩετ͔Βؑผ਍அʹඞཁͳ৘ใΛநग़͍ͨ͠ എܠɾ໨త (2/3) ͕ࣖશ͘ௌ͑͜ͳ͍Ͱ͢ɻ ͕ࣖௌ͑͜ͳ͍ͷ͸͍ͭɺ ͲͷΑ͏ʹ࢝·Γ·͔ͨ͠ ͔? • ਍அࢧԉγεςϜͷ༷ͳγεςϜ΁ͷೖྗΛߟ͑Δ

    • ঱ঢ়ͷ༗ແɺطԟྺɾՈ଒ྺɾࣾձྺͳͲ • ঱ঢ়ʹ͍ͭͯ͸͞Βʹ঱ঢ়ͷڧ͞ɺ͍ͭͲͷΑ͏ʹൃ঱͔ͨ͠ɺ࣋ଓ࣌ؒɺൃ ঱ͨ͠ঢ়گͳͲ͕ؑผ਍அʹॏཁ
  4. 9 എܠɿྟচҩ͕පྺௌऔΛߦͳ͏ͱ͖ͷܕ എܠɾ໨త (2/3) ྫɿOPQRST๏ʹΑΔ঱ঢ়ͷධՁ߲໨ ߲໨ ҙຯ આ໌ O Onset

    ঱ঢ়ͷൃ঱༷ࣜɺൃ঱࣌ظ P Palliation/Provocation ঱ঢ়ͷ૿ѱɾ؇࿨Ҽࢠ Q Quality ௧Έ΍঱ঢ়ͷੑ࣭ R Region/Radiation ঱ঢ়ͷൃੜ෦Ґͱ์ࢄ S Severity ঱ঢ়ͷڧ͞΍௧Έͷఔ౓ T Timing ঱ঢ়࣋ଓ࣌ؒɾൃੜස౓
  5. 12 ࡐྉɿݱපྺςΩετ ࡐྉɾख๏ (1/6) Avey Benchmark Vignette suite [1] ঱ঢ়νΣοΧʔͷੑೳධՁ༻ʹ࡞ΒΕͨ঱ྫςΩετσʔληοτ


    ೥ྸɾੑผɾओૌɾݱපྺΛ೔ຊޠʹ຋༁ͯ͠ར༻ [1] Hammoud, M., Douglas, S., Darmach, M., Alawneh, S., Sanyal, S., and Kanbour, Y.: Evaluating the Diagnostic Performance of Symptom Checkers: Clinical Vignette Study, JMIR AI, Vol. 3, p. e46875 (2024) جຊ৘ใɿ 40ࡀ ঁੑ ओૌɿ Ί·͍ ݱපྺɿ 1 ೥ؒʹΘͨΓ܁Γฦ͞ΕΔΊ·͍ͷൃ࡞͕͋Γ·͢ɻΊ·͍ͷൃ࡞͸෦԰͕ճస͠ ͍ͯΔΑ͏ͳײ֮Ͱɺ20෼͔Β਺࣌ؒଓ͖ɺు͖ؾ΍ᅅుΛ൐͏͜ͱ͕͋Γ·͢ɻ ͜ΕΒͷൃ࡞͸೔ৗੜ׆ʹࢧোΛ͖ͨ͠ɺΊ·͍΍;Β͖͕ͭ਺೔ؒଓ͘͜ͱ͕͋ Γ·͢ɻҙࣝΛࣦͬͨ͜ͱ͸͋Γ·ͤΜɻ·ͨױऀ͸ࣖͷ٧·ͬͨײ͡ɺࣖ໐Γ…
  6. 16 ࣮ݧઃఆɿσʔληοτߏங ࡐྉɾख๏ (3/6) • Avey Benchmark Vignette suite ʹରͯ͠ҩࢣͱΤϯδχΞͰ


    ΞϊςʔγϣϯΛ࡞੒ • Few-shotྫࣔ༻ͷ܇࿅ηοτͱධՁ༻ͷςετηοτʹ঱ྫΛ෼཭ ߲໨ ܇࿅ηοτ ςετηοτ ঱ྫ਺ 30 100 ग़ݱΤϯςΟςΟ਺ 329 950 ϢχʔΫΤϯςΟςΟ਺ 200 415
  7. 18 ࣮ݧઃఆɿϓϩϯϓτͷҧ͍ʹΑΔൺֱ ࡐྉɾख๏ (5/6) • Few-shotྫࣔ਺ • 0, 10, 20,

    30 • Chain-of-Thoughtͷ༗ແ • Reasonग़ྗͷ༗ແ • ஌ࣝϕʔεͷݴޠ • ೔ຊޠɾӳޠ
  8. 19 ࣮ݧઃఆɿϞσϧͱධՁࢦඪ ࡐྉɾख๏ (6/6) • LLMϞσϧ • Gemini 1.5 Pro

    002 • 2025೥1݄࣌఺Ͱར༻ՄೳͰ͋ͬͨLong-ContextϞσϧ͔Β࠾༻ • ධՁࢦඪ • Recall, Precision, F1, ई౓·ͰͷҰகΛߟྀͨ͠Recall • εέʔϧΤϯςΟςΟʹ͍ͭͯ͸εέʔϧ஋ͷҰக΋ߟྀ • ਌Λ࣋ͭΤϯςΟςΟΛग़ྗͨ͠৔߹͸਌΋ಉ࣌ʹग़ྗͨ͠ͱΈͳ͢ • ྫɿסੑ֏ᅉΛग़ྗͨ͠৔߹͸֏ᅉ΋ग़ྗͨ͠ͱ͢Δ
  9. 24 ޡΓ෼ੳɿޡΓ͓Αͼਖ਼ղྫ (ଠࣈ෦෼ʹର͢Δநग़݁Ռ) ޡΓ෼ੳ (1/3) จষ ਖ਼ղ ༧ଌ 6࣌ؒલʹӈ࿹ͱӈ٭ʹ୤ྗ͕͋Γɺ30෼Ҏ಺ʹ࣏·ͬͨ Ұաੑͷยຑᙺ

    ্ࢶ୯ຑᙺ ന͍ᙹΛ൐ͳ͏ຫੑతͳேͷ֏͕͋Γ ࣪ੑ֏ᅉ ֏ᅉ ඍ೤ɺ๹ͱඓʹ੺͍ൃ਄ɺෳ਺ͷؔઅʹ௧ΈΛૌ͑ͯདྷӃ ௏ܗߚൗ ൽෘͷൃ੺ ༦৯ʹϑϥΠυνΩϯΛ৯΂ͨޙɺӈ্ෲ෦ʹ࣋ଓతͰܹ͍͠௧ Έɺు͖ؾɺᅅు͕ݱΕ·ͨ͠ɻ ༉΋ͷΛ৯΂ͨ ޙͷෲ௧ ͳ͠ ೛৷37िͰɺͻͲ͍಄௧ͱٸͳෲ௧Λૌ͑ͯདྷӃɻ4೔લʹఆظత ͳ೛්݈਍Λड͚͓ͯΓɺͦͷࡍʹ͸ಛʹ঱ঢ়΍ҟৗ͸ใࠂ͞Εͯ ͍·ͤΜͰͨ͠ɻ ֘౰ͳ͠ ಄௧ͷग़ݱ࣌ظ ෲ௧ͷग़ݱ࣌ظ ࢦͱख͕੨͘ͳΓɺͦͷޙന͘ͳΓɺ࠷ऴతʹ੺͘ͳͬͯ௧Ήൃ࡞ ͕ಛʹפ͍ͱ͖ʹى͜Δ ϨΠϊʔݱ৅ ϨΠϊʔݱ৅
  10. 25 ޡΓ෼ੳɿޡΓύλʔϯ • ҩֶతͳਪ࿦ෆ଍ • ʮӈखͱӈ٭ʹ୤ྗ͕͋Γʯ͔Βยຑᙺ͕ਪ࿦Ͱ͖ͳ͍ • ࣌ܥྻೝࣝͷޡΓ • աڈͷग़དྷࣄΛ௚ۙͷग़དྷࣄͱͯ͠ѻ͏

    • ൱ఆ͞Εͨ঱ঢ়ͷநग़ • େن໛ݴޠϞσϧͷط஌ͷ໰୊ • ஌ࣝϕʔεࣗମͷᐆດੑʹ༝དྷ • ྫɿ᷺ଵײɺർ࿑ײɺқർ࿑ײͳͲࣅͨ֓೦ͷ߲໨͕ଘࡏ͍ͯ͠Δ ޡΓ෼ੳ (2/3)
  11. 26 ޡΓ෼ੳɿ௥Ճ࣮ݧ • ᙹΛ൐͏֏͸ʮ࣪ੑ֏ᅉʯΛग़ྗ͍ͤͨ͞ • ঱ঢ়໊ʹิ଍Λ௥Ճ → ෆਖ਼ղ • LLMࣗ਎ʹΑΔग़ྗͷࣗݾݕূ

    → గਖ਼Ͱ͖ͣෆਖ਼ղ • ΑΓੑೳͷߴ͍ޙܧϞσϧͰݕূ • Gemini 2.0 flash → ෆਖ਼ղ • Gemini 2.5 flash-0417 (Thinking Off) → ෆਖ਼ղ • Gemini 2.5 flash-0417 (Thinking On) → ਖ਼ղ ޡΓ෼ੳ (3/3)
  12. 28 ࢀߟจݙ 1. Hammoud, M., Douglas, S., Darmach, M., Alawneh,

    S., Sanyal, S., and Kanbour, Y.: Evaluating the Diagnostic Performance of Symptom Checkers: Clinical Vignette Study, JMIR AI, Vol. 3, p. e46875 (2024) 2. Ding, Y., Zeng, Q., and Weninger, T.: ChatEL: Entity Linking with Chatbots, arXiv [cs.CL] (2024) 3. French, E. and McInnes, B. T.: An overview of biomedical entity linking throughout the years, J. Biomed. Inform., Vol. 137, No. 104252, p. 104252 (2023) 4. Li, M. and Zhang, R.: How far is Language Model from 100% Few-shot Named Entity Recognition in Med- ical Domain, arXiv [cs.CL] (2023) 5. Miller, R. A., McNeil, M. A., Challinor, S. M., Masarie, F. E., Jr, and Myers, J. D.: The INTERNIST- 1/QUICK MEDICAL REFERENCE project–status re- port, West. J. Med., Vol. 145, No. 6, pp. 816–822 (1986) 6. Wang, S., Sun, X., Li, X., Ouyang, R., Wu, F., Zhang, T., Li, J., and Wang, G.: GPT-NER: Named Entity Recognition via Large Language Models, arXiv [cs.CL] (2023) 7. Xu, L., Zhou, Q., Gong, K., Liang, X., Tang, J., and Lin, L.: End-to-End Knowledge-routed Relational Dialogue System for automatic diagnosis, Proc. Conf. AAAI Artif. Intell., Vol. 33, No. 01, pp. 7346–7353 (2019) 8. Zou, X., He, W., Huang, Y., Ouyang, Y., Zhang, Z., Wu, Y., Wu, Y., Feng, L., Wu, S., Yang, M., Chen, X., Zheng, Y., Jiang, R., and Chen, T.: AI-driven diagnostic assistance in medical inquiry: Reinforcement learning algorithm development and validation, J. Med. Internet Res., Vol. 26, p. e54616 (2024) 9. ੢ࢁஐ߂, ࣲాେ࡞, Ӊ໺༟, ⁋઒߶ൣ, ๺ग़༞, ٱอ խ༸, ໼ాॡଠ࿠, एٶᠳࢠ, ߥ຀ӳ࣏:ੜ੒Ϟσϧ͸ҩྍ ς Ωετͷݻ༗දݱநग़ʹ࢖͑Δ͔?, ݴޠॲཧֶձ೥࣍େ ձൃද࿦จू (Web), Vol. 30th, pp. 11–11 (2024)