Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
安全なAI利用のためのLLM(大規模言語モデル)の利用と評価 / japanr2025
Search
Sponsored
·
Your Podcast. Everywhere. Effortlessly.
Share. Educate. Inspire. Entertain. You do you. We'll handle the rest.
→
Uryu Shinya
December 06, 2025
Science
84
0
Share
Embed
Copy iframe code
Copy JS code
Copy link
Start on current slide
安全なAI利用のためのLLM(大規模言語モデル)の利用と評価 / japanr2025
Uryu Shinya
December 06, 2025
More Decks by Uryu Shinya
See All by Uryu Shinya
生成AIサービスを用いた研究活動の支援
s_uryu
0
220
R研究集会(2024)のご案内
s_uryu
1
750
生成AIを用いたサービスの紹介
s_uryu
1
240
生成AIの基礎的事項と社会に与える影響
s_uryu
0
98
Rの機械学習フレームワークの紹介〜tidymodelsを中心に〜 / machine_learning_with_r2024
s_uryu
0
1.5k
地理空間データの機械学習への適用 / machine_learning_for_spatial_data
s_uryu
0
390
mandaRa: R言語ユーザのための新しい知識共有の場 / mandara_tokyor111
s_uryu
2
790
R言語入門 (R-4.3.3 2024年4月版) / introduction to r
s_uryu
7
7.2k
統・再現性・協力: 人為的過誤を防ぎ、未来へ進む策 / Integration, Reproducible, and Collaboration
s_uryu
1
860
Other Decks in Science
See All in Science
生成AI・プレプリント時代における 研究成果公開の再設計 ― トップカンファレンス文化はどこへ向かうのか / Redesigning the Dissemination of Research Outputs in the Age of Generative AI and Preprints — Where Is the Top-Conference Culture Heading?
ykiyota
0
28k
先端因果推論特別研究チームの研究構想と 人間とAIが協働する自律因果探索の展望
sshimizu2006
3
940
水耕栽培を始める前に知っておきたい植物の科学
grow_design_lab
0
250
NDCG is NOT All I Need
statditto
2
3.2k
見上公一.pdf
genomethica
0
150
Wet Active Matter
rajeshrinet
0
110
人生を変えた一冊「独学大全」のはなし / Self-study ENCYCLOPEDIA: The Book Which Change My Life #独学大全 #EM推し本
expajp
0
160
Cross-Media Technologies, Information Science and Human-Information Interaction
signer
PRO
3
32k
AkarengaLT vol.40
hashimoto_kei
0
110
データベース06: SQL (3/3) 副問い合わせ
trycycle
PRO
1
990
やるべきときにMLをやる AIエージェント開発
fufufukakaka
2
1.5k
Testing the Longevity Bottleneck Hypothesis
chinson03
0
330
Featured
See All Featured
Technical Leadership for Architectural Decision Making
baasie
3
420
Side Projects
sachag
455
43k
Designing Powerful Visuals for Engaging Learning
tmiket
1
420
Tips & Tricks on How to Get Your First Job In Tech
honzajavorek
1
540
How to Talk to Developers About Accessibility
jct
2
250
Balancing Empowerment & Direction
lara
6
1.2k
The Myth of the Modular Monolith - Day 2 Keynote - Rails World 2024
eileencodes
28
3.5k
Dealing with People You Can't Stand - Big Design 2015
cassininazir
367
27k
RailsConf & Balkan Ruby 2019: The Past, Present, and Future of Rails at GitHub
eileencodes
141
35k
Sharpening the Axe: The Primacy of Toolmaking
bcantrill
46
2.9k
Evolution of real-time – Irina Nazarova, EuRuKo, 2024
irinanazarova
9
1.4k
Automating Front-end Workflow
addyosmani
1370
210k
Transcript
ӝੜਅʢಙౡେֶσβΠϯܕ"*ڭҭݚڀηϯλʔʣ ҆શͳ"*ར༻ͷͨΊͷ --.ʢେنݴޠϞσϧʣͷ ར༻ͱධՁ +BQBO3 !V@SJCP
എܠ--.ධՁͷඞཁੑ σʔλ४උ Ϟσϧ܇࿅ʢֶशʣ ςετσʔλͰධՁ ਫ਼ɾ࠶ݱͳͲࢉग़ ίʔυͰ࠶ݱՄೳ ϓϩϯϓτઃܭ --.Ͱਪ ࠾ ίʔυͰ࠶ݱՄೳʁ
ػցֶशϞσϧͷධՁ --.ͷධՁ ධՁ͖ͭ͢ͷϙΠϯτ ✓ͲͷϞσϧ͕ߴੑೳ͔ͩͬͨ ✓ͳͥͦͷ݁ʹࢸͬͨͷ͔ ✓खॱͱաఔ͕ͤΔ͔ w $IBU(15Ͱճࢼͯ͠ʮ͍͍ײͩ͡ͳʯ w ʮ(15͕ݡ͍ʯͱ͍͏ӟ͚ͩͰϞσϧબ w ͨ·ͨ·ޭͨ͠ϓϩϯϓτͰʮ༏लʯͱஅ
6SZV 4 &WBMVBUJOH-BSHF-BOHVBHF.PEFMTGPS*6$/3FE-JTU4QFDJFT*OGPSNBUJPOBS9JW w *6$/ઈ໓ةዧछධՁͷࣄྫ w ੜଟ༷ੑอશͷͰ--.ͷ׆༻͕ظ͞Ε͍ͯΔ͕ɺ ઐతஅʹ͓͚Δ৴པੑʹ͕ٙΔɻ w
ʢݱߦͷ--.ʹڞ௨ͨ͠ʣͭͷॏେͳ՝ w ࣝͱਪͷΪϟοϓ ˠࣄ࣮͍ͬͯΔ͕ɺͦΕΛԠ༻ͨ͠அࠔ w ࡏ͢ΔόΠΞε ˠಈʢਓؾछʣʹڧ͘ɺແಈʹऑ͍ എܠ--.ධՁͷඞཁੑ https://arxiv.org/abs/2510.02830 ٬؍త͔ͭݫີͳධՁϑϨʔϜϫʔΫ͕ෆՄܽ ਖ਼ղͷဃ ྨֶతࣝ อશঢ়گͷਪ 94.9% 27.2%
w Φʔϓϯιʔεಁ໌ੑͷߴ͍࣮ w ҆શੑࢤ҆શੑͱ৴པੑΛ࠷ॏཁࢹ w ࠶ݱੑ࠶ݱՄೳͳՊֶతݕূ w ॊೈੑͱ֦ுੑ0QFO"* (PPHMF "OUISPQJD
Y"* ϩʔΧϧڥʢ0MMBNBʣɺଟ༷ͳϞσϧΛ ϕϯμʔϩοΫΠϯͳ͠ͰධՁɻ ӳࠃ"*҆શݚڀॴ͕ओಋ ධՁϑϨʔϜϫʔΫʮ*OTQFDU"*ʯ https://inspect.aisi.org.uk/ ++"MMBJSF 34UVEJPઃऀ ͕ ϓϩδΣΫτΛϦʔυ
ධՁͷϞδϡʔϧԽ5BTL %BUBTFU 4PMWFS 4DPSFS ධՁϩδοΫΛίʔυͱͯ͠ମܥతʹཧɺ࠶ར༻͕ՄೳͱͳΔ 5BTL࣮ݧܭը %BUBTFUೖྗσʔλ 4PMWFSճઓུ 4DPSFSධՁج४ ධՁʹ༻͢Δೖྗσʔλͱ
ਖ਼ղϥϕϧͷηοτ ϓϩϯϓτΤϯδχΞϦϯάͳͲɺ Ϟσϧ͔ΒճΛҾ͖ग़ͨ͢Ίͷઓུ ධՁશମͷϫʔΫϑϩʔΛఆٛ Ϟσϧͷग़ྗΛਖ਼ղͱൺֱ͠ɺ είΞΛࢉग़͢ΔͨΊͷධՁج४ Task( dataset=..., solver=chain(...), scorer=..., )
*OTQFDU"*ʹΑΔ*6$/ධՁλεΫͷ࣮ 6SZV ͷͭͷλεΫͷద༻ྫ λεΫ త ༻ͨ͠4PMWFS4DPSFSͷྫ ྨֶతྨ ϨουϦετΧςΰϦධՁ ཧత
ڴҖͷಛఆ ਖ਼͍͠ྨ܈Λબͤ͞Δ ͭͷΧςΰϦ͔ΒͭΛಛఆ ࠃ໊ͷϦετΛੜ ͷڴҖΧςΰϦ͔ΒෳΛબ https://github.com/uribo/iucn-redlist-evals chain(), optimize_choices()*, system_message(), multiple_choice_with_cache()*, taxon_partial_scorer()* system_message(), generate(), match() system_message(), generate(), geo_distribution_scorer()* system_message(), generate(), threat_assessment_scorer()*
*OTQFDU"*ʹΑΔ*6$/ධՁλεΫͷ࣮ 6SZV ͷͭͷλεΫͷద༻ྫ *OQVU 5BSHFU Aquila chrysaetos https://github.com/uribo/iucn-redlist-evals
1IPUP3PDLZ $$#:IUUQTDSFBUJWFDPNNPOTPSHMJDFOTFTCZ WJB8JLJNFEJB$PNNPOT B LC $IPJDFT A. Animalia (Kingdom) > Chordata (Phylum) > Aves (Class) > Accipitriformes (Order) > Pandionidae (Family), B. … (Kingdom) > … (Phylum) > … (Class) > … (Order) > Accipitridae (Family), C. … (Kingdom) > … (Phylum) > … (Class) > … (Order) > Cathartidae (Family), D. … (Kingdom) > … (Phylum) > … (Class) > … (Order) > Sagittariidae (Family)”, E. … (Kingdom) > … (Phylum) > … (Class) > … (Order) > Elanidae (Family)" "OTXFS B &WBMVBUF Correct EX, EW, CR, EN, VU, NT, LC, DD NT Incorrect Montenegro; Italy; France; Albania etc., Country list Montenegro; France; Iraq etc., Partial Agriculture & aquaculture; Pollution; Energy production & mining; Transportation & service corridors etc. Threats list None Incorrect 5BTL ʢΠψϫγʣ ܽམɺ
3൛͋ΔϤʂWJUBMTύοέʔδ --.ͱͷରFMMNFSύοέʔδΛհͯ͠ߦ͏ https://vitals.tidyverse.org/ library(vitals) library(ellmer) simple_qa <- tibble::tibble( input =
c("日本の初代総理大臣は誰か", "Posit(旧RStudio)のチーフサイエンティストは誰か"), target = c("伊藤博文", "Hadley Wickham") ) tsk <- Task$new( dataset = simple_qa, solver = generate(chat_ollama(model = "gpt-oss:20b")), scorer = model_graded_fact() ) tsk$eval() tsk$score() 5BTL࣮ݧܭը %BUBTFUೖྗσʔλ 4PMWFSճઓུ 4DPSFSධՁج४ ਪϞσϧͷࢦఆ
%&.0
ධՁͷίʔυԽ ධՁϓϩηεΛίʔυͱͯ͠هड़ɾཧ͢Δ ͭͷϝϦοτ ✓৽ϞσϧͰͷଈ࠲ͳ࠶ݕূ ✓ධՁͷಁ໌ੑͱՄೳੑ ✓ίϛϡχςΟͰͷڞ༗ɾվળ w Ͳͷج४Ͱఆ͔ͨ͠໌֬ w ݁Ռͷࠜڌ͕Մೳ
w ϩάͱͯ͠ه͞ΕΔ ධՁͷಁ໌ੑ w ධՁίʔυΛ(JU)VCͰެ։ w ϕϯνϚʔΫͱͯ͠ػೳ ίϛϡχςΟͰͷར༻
"*ͷԸܙΛ࠷େԽ͠ɺϦεΫΛ࠷খԽ͢Δ ͨΊʹɻ ʮ͏ʯ͚ͩͰͳ͘ɺਖ਼͘͠ʮධՁ͢Δʯ ϓϩηε͕ඞਢɻ ͓ΘΓ