Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
[mercari GEARS 2025] Evals for LLMApps/Agents
Search
Sponsored
·
Ship Features Fearlessly
Turn features on and off without deploys. Used by thousands of Ruby developers.
→
mercari
PRO
November 14, 2025
Technology
0
83
[mercari GEARS 2025] Evals for LLMApps/Agents
mercari
PRO
November 14, 2025
Tweet
Share
More Decks by mercari
See All by mercari
[DevDojo] Getting Started with BI: Looker Essentials - 2025
mercari
PRO
0
59
[DevDojo] Introduction to LLMs & AI Agents - 2025
mercari
PRO
0
74
[mercari GEARS 2025] Techniques for Reliable Code Generation Using AI Agents
mercari
PRO
0
190
[mercari GEARS 2025] Foundations of AI - The Invisible Forces Driving Product Innovation
mercari
PRO
0
180
[mercari GEARS 2025] Building Foundation for Mercari’s Global Expansion
mercari
PRO
1
310
[mercari GEARS 2025] The Past, Present, and Future of Anti-Phishing Measures at Mercari
mercari
PRO
0
83
[mercari GEARS 2025] Backend Standardization with MCP
mercari
PRO
0
120
[mercari GEARS 2025] Transforming customer engagement with Google Customer Engagement Suite
mercari
PRO
0
190
[mercari GEARS 2025] PJ Aurora’s Vision and Automated UI Quality Evaluation Agents
mercari
PRO
0
150
Other Decks in Technology
See All in Technology
Exadata Fleet Update
oracle4engineer
PRO
0
1.1k
ClickHouseはどのように大規模データを活用したAIエージェントを全社展開しているのか
mikimatsumoto
0
320
日本の85%が使う公共SaaSは、どう育ったのか
taketakekaho
1
270
配列に見る bash と zsh の違い
kazzpapa3
3
190
衛星画像即時マッピングサービスの実現に向けて
lehupa
1
200
Bill One急成長の舞台裏 開発組織が直面した失敗と教訓
sansantech
PRO
2
430
1,000 にも届く AWS Organizations 組織のポリシー運用をちゃんとしたい、という話
kazzpapa3
0
240
OpenShiftでllm-dを動かそう!
jpishikawa
0
190
Amazon Rekognitionで 「信玄餅きなこ問題」を解決する
usanchuu
1
210
マネージャー視点で考えるプロダクトエンジニアの評価 / Evaluating Product Engineers from a Manager's Perspective
hiro_torii
0
250
AI駆動開発を事業のコアに置く
tasukuonizawa
1
1.3k
こんなところでも(地味に)活躍するImage Modeさんを知ってるかい?- Image Mode for OpenShift -
tsukaman
1
230
Featured
See All Featured
How to Think Like a Performance Engineer
csswizardry
28
2.5k
Distributed Sagas: A Protocol for Coordinating Microservices
caitiem20
333
22k
DevOps and Value Stream Thinking: Enabling flow, efficiency and business value
helenjbeal
1
120
Design and Strategy: How to Deal with People Who Don’t "Get" Design
morganepeng
133
19k
GraphQLの誤解/rethinking-graphql
sonatard
74
11k
Practical Tips for Bootstrapping Information Extraction Pipelines
honnibal
25
1.8k
Raft: Consensus for Rubyists
vanstee
141
7.3k
How People are Using Generative and Agentic AI to Supercharge Their Products, Projects, Services and Value Streams Today
helenjbeal
1
130
Sam Torres - BigQuery for SEOs
techseoconnect
PRO
0
190
ラッコキーワード サービス紹介資料
rakko
1
2.3M
The Language of Interfaces
destraynor
162
26k
The Illustrated Children's Guide to Kubernetes
chrisshort
51
51k
Transcript
Evals for LLMApps/Agents Jehandad Kamal Mercari / Engineer @ AI/LLM
Team
Mercari AI Listing Support https://about.mercari.com/en/press/news/articles/20240910_aisupport/
Agents
Agent Development Loop
First SDLC for Agents ai.mercari.com
Types of Evals ( by technique ) ai.mercari.com
Types of Evals (by perspective )
Improved SDLC for Agents
Thank You!