$30 off During Our Annual Pro Sale. View Details »
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
[mercari GEARS 2025] Evals for LLMApps/Agents
Search
mercari
PRO
November 14, 2025
Technology
0
65
[mercari GEARS 2025] Evals for LLMApps/Agents
mercari
PRO
November 14, 2025
Tweet
Share
More Decks by mercari
See All by mercari
[DevDojo] Getting Started with BI: Looker Essentials - 2025
mercari
PRO
0
25
[DevDojo] Introduction to LLMs & AI Agents - 2025
mercari
PRO
0
36
[mercari GEARS 2025] Techniques for Reliable Code Generation Using AI Agents
mercari
PRO
0
66
[mercari GEARS 2025] Foundations of AI - The Invisible Forces Driving Product Innovation
mercari
PRO
0
99
[mercari GEARS 2025] Building Foundation for Mercari’s Global Expansion
mercari
PRO
1
250
[mercari GEARS 2025] The Past, Present, and Future of Anti-Phishing Measures at Mercari
mercari
PRO
0
64
[mercari GEARS 2025] Backend Standardization with MCP
mercari
PRO
0
84
[mercari GEARS 2025] Transforming customer engagement with Google Customer Engagement Suite
mercari
PRO
0
120
[mercari GEARS 2025] PJ Aurora’s Vision and Automated UI Quality Evaluation Agents
mercari
PRO
0
87
Other Decks in Technology
See All in Technology
子育てで想像してなかった「見えないダメージ」 / Unforeseen "hidden burdens" of raising children.
pauli
2
260
大企業でもできる!ボトムアップで拡大させるプラットフォームの作り方
findy_eventslides
1
820
AI時代の新規LLMプロダクト開発: Findy Insightsを3ヶ月で立ち上げた舞台裏と振り返り
dakuon
0
190
[デモです] NotebookLM で作ったスライドの例
kongmingstrap
0
160
regrowth_tokyo_2025_securityagent
hiashisan
0
250
re:Inventで気になったサービスを10分でいけるところまでお話しします
yama3133
1
120
Jakarta Agentic AI Specification - Status and Future
reza_rahman
0
110
AI-DLCを現場にインストールしてみた:プロトタイプ開発で分かったこと・やめたこと
recruitengineers
PRO
2
150
生成AI活用の型ハンズオン〜顧客課題起点で設計する7つのステップ
yushin_n
0
230
AIエージェント開発と活用を加速するワークフロー自動生成への挑戦
shibuiwilliam
3
140
re:Invent2025 3つの Frontier Agents を紹介 / introducing-3-frontier-agents
tomoki10
0
230
今からでも間に合う!速習Devin入門とその活用方法
ismk
1
750
Featured
See All Featured
JavaScript: Past, Present, and Future - NDC Porto 2020
reverentgeek
52
5.8k
Easily Structure & Communicate Ideas using Wireframe
afnizarnur
194
17k
Why Our Code Smells
bkeepers
PRO
340
57k
4 Signs Your Business is Dying
shpigford
186
22k
Testing 201, or: Great Expectations
jmmastey
46
7.8k
Navigating Team Friction
lara
191
16k
It's Worth the Effort
3n
187
29k
Improving Core Web Vitals using Speculation Rules API
sergeychernyshev
21
1.3k
How to train your dragon (web standard)
notwaldorf
97
6.4k
Designing for Performance
lara
610
69k
[RailsConf 2023 Opening Keynote] The Magic of Rails
eileencodes
31
9.8k
Faster Mobile Websites
deanohume
310
31k
Transcript
Evals for LLMApps/Agents Jehandad Kamal Mercari / Engineer @ AI/LLM
Team
Mercari AI Listing Support https://about.mercari.com/en/press/news/articles/20240910_aisupport/
Agents
Agent Development Loop
First SDLC for Agents ai.mercari.com
Types of Evals ( by technique ) ai.mercari.com
Types of Evals (by perspective )
Improved SDLC for Agents
Thank You!