Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
AI Agentにおける評価指標とAgent GPA
Search
tsho
February 26, 2026
Technology
1
150
AI Agentにおける評価指標とAgent GPA
第61回 MLOps 勉強会の発表です。AI Agentの評価指標であるAgent GPAの説明です。
tsho
February 26, 2026
Tweet
Share
More Decks by tsho
See All by tsho
Scale SciPy with jax.shard_map - jax.shard_mapで SciPy をスケール
tsho
0
93
Developer Advocate / Community Managerなるには?
tsho
0
690
25/04/12 - Build with AI Hands-on Appendix
tsho
1
71
Unit testしてますか?
tsho
1
650
Other Decks in Technology
See All in Technology
NW構成図の自動描画は何が難しいのか?/netdevnight3
corestate55
2
460
失敗できる意思決定とソフトウェアとの正しい歩き方_-_変化と向き合う選択肢/ Designing for Reversible Decisions
soudai
PRO
7
810
Agent Ready になるためにデータ基盤チームが今年やること / How We're Making Our Data Platform Agent-Ready
zaimy
0
170
名刺メーカーDevグループ 紹介資料
sansan33
PRO
0
1.1k
バニラVisaギフトカードを棄てるのは結構大変
meow_noisy
0
140
EMから現場に戻って見えた2026年の開発者視点
sudoakiy
1
460
「データとの対話」の現在地と未来
kobakou
0
620
Amazon Bedrock AgentCoreでブラウザ拡張型AI調査エージェントを開発した話 (シングルエージェント編)
nasuvitz
2
110
Claude Codeと駆け抜ける 情報収集と実践録
sontixyou
1
1.1k
生成AI活用によるPRレビュー改善の歩み
lycorptech_jp
PRO
4
1.4k
primeNumber DATA MANAGEMENT CAMP #2:
masatoshi0205
1
550
今、求められるデータエンジニア
waiwai2111
2
1.5k
Featured
See All Featured
How to build an LLM SEO readiness audit: a practical framework
nmsamuel
1
660
Lessons Learnt from Crawling 1000+ Websites
charlesmeaden
PRO
1
1.1k
Money Talks: Using Revenue to Get Sh*t Done
nikkihalliwell
0
170
Visualizing Your Data: Incorporating Mongo into Loggly Infrastructure
mongodb
49
9.9k
Music & Morning Musume
bryan
47
7.1k
AI in Enterprises - Java and Open Source to the Rescue
ivargrimstad
0
1.2k
So, you think you're a good person
axbom
PRO
2
1.9k
[Rails World 2023 - Day 1 Closing Keynote] - The Magic of Rails
eileencodes
38
2.7k
Navigating Algorithm Shifts & AI Overviews - #SMXNext
aleyda
1
1.1k
What Being in a Rock Band Can Teach Us About Real World SEO
427marketing
0
180
Building Flexible Design Systems
yeseniaperezcruz
330
40k
Redefining SEO in the New Era of Traffic Generation
szymonslowik
1
230
Transcript
© 2026 Snowflake Inc. All Rights Reserved AI Agent における評価手法と
Agent GPA Sho Tanaka Feb 2026
© 2026 Snowflake Inc. All Rights Reserved AI/ML, Dataの登壇やデモ開発を担当 -
ex-Google gTech Ads, ML/Data - MLOps community 運営 (2020~) - Google Developer Expert, AI/ML tsho / 田中 翔 (Sho Tanaka) Linkedin.com/in/tsho Lead Developer Advocate @ Snowflake
© 2026 Snowflake Inc. All Rights Reserved AI Agentとは?
© 2026 Snowflake Inc. All Rights Reserved
© 2026 Snowflake Inc. All Rights Reserved AI Agent の活用事例
メルカリにおけるデータアナリティクス AI エージェント「Socrates」と ADK 活用 事例 - Speaker Deck コクヨ、ジンズなどがAIエージェント自社 開発 「Snowflake Intelligence」日本提 供
© 2026 Snowflake Inc. All Rights Reserved
© 2026 Snowflake Inc. All Rights Reserved AIOpsは2016年ごろにガー トナーが定義したものもあ るので注意
© 2026 Snowflake Inc. All Rights Reserved
© 2026 Snowflake Inc. All Rights Reserved 評価手法
© 2026 Snowflake Inc. All Rights Reserved AI Agent /
LLM による代表的な評価指標
© 2026 Snowflake Inc. All Rights Reserved 例:ADK の評価指標 Why
Evaluate Agents - Agent Development Kit (ADK) LLM-as-a-judge Final_response_match_v2, rubric_based_final_response_qual ity_v1 etc. Code-based / Deterministic コード・ルールベース/一致 tool_trajectory_avg_score Traditional NLP Metrics 従来の自然言語処理指標 response_match_score Human Evaluation 人間による評価 (機能として明示的な「指標」はない が、Web UI (Trace View) で支援)
© 2026 Snowflake Inc. All Rights Reserved Agent GPA と
TruLens
© 2026 Snowflake Inc. All Rights Reserved
© 2026 Snowflake Inc. All Rights Reserved
© 2026 Snowflake Inc. All Rights Reserved
© 2026 Snowflake Inc. All Rights Reserved Agent GPA の論文
What Is Your Agent's GPA? A Framework for Evaluating Agent Goal-Plan-Action Alignment
© 2026 Snowflake Inc. All Rights Reserved OSS としても提供中 https://github.com/truera/trulens
https://www.trulens.org/
© 2026 Snowflake Inc. All Rights Reserved https://www.trulens.org/getting_started/quickstarts/web-search-agent-evaluation/#10-add- evaluations
© 2026 Snowflake Inc. All Rights Reserved
© 2026 Snowflake Inc. All Rights Reserved さいごに
© 2026 Snowflake Inc. All Rights Reserved Snowflake 上で Private
Preview として提供中 What’s Your Agent’s GPA? A Framework for Evaluating AI Agent Reliability
© 2026 Snowflake Inc. All Rights Reserved 参考
© 2026 Snowflake Inc. All Rights Reserved CS 329T: Trustworthy
Machine Learning
© 2026 Snowflake Inc. All Rights Reserved https://learn.deeplearning.ai/
© 2026 Snowflake Inc. All Rights Reserved THANK YOU