Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
[mercari GEARS 2025] Evals for LLMApps/Agents
Search
mercari
PRO
November 14, 2025
Technology
0
88
[mercari GEARS 2025] Evals for LLMApps/Agents
mercari
PRO
November 14, 2025
Tweet
Share
More Decks by mercari
See All by mercari
[DevDojo] Getting Started with BI: Looker Essentials - 2025
mercari
PRO
0
67
[DevDojo] Introduction to LLMs & AI Agents - 2025
mercari
PRO
0
90
[mercari GEARS 2025] Techniques for Reliable Code Generation Using AI Agents
mercari
PRO
0
210
[mercari GEARS 2025] Foundations of AI - The Invisible Forces Driving Product Innovation
mercari
PRO
0
190
[mercari GEARS 2025] Building Foundation for Mercari’s Global Expansion
mercari
PRO
1
330
[mercari GEARS 2025] The Past, Present, and Future of Anti-Phishing Measures at Mercari
mercari
PRO
0
95
[mercari GEARS 2025] Backend Standardization with MCP
mercari
PRO
0
130
[mercari GEARS 2025] Transforming customer engagement with Google Customer Engagement Suite
mercari
PRO
0
210
[mercari GEARS 2025] PJ Aurora’s Vision and Automated UI Quality Evaluation Agents
mercari
PRO
0
170
Other Decks in Technology
See All in Technology
「ヒットする」+「近い」を同時にかなえるスマートサジェストの作り方.pdf
nakasho
0
140
EMからVPoEを経てCTOへ:マネジメントキャリアパスにおける葛藤と成長
kakehashi
PRO
9
1.2k
研究開発部メンバーの働き⽅ / Sansan R&D Profile
sansan33
PRO
4
22k
Ultra Ethernet (UEC) v1.0 仕様概説
markunet
3
220
Kaggleの経験が実務にどう活きているか / kaggle_findy
sansan_randd
6
1.1k
Master Dataグループ紹介資料
sansan33
PRO
1
4.5k
JAWS Days 2026 楽しく学ぼう! 認証認可 入門/20260307-jaws-days-novice-lane-auth
opelab
9
1.5k
聲の形にみるアクセシビリティ
tomokusaba
0
130
型を書かないRuby開発への挑戦
riseshia
0
200
AWSをCLIで理解したい! / I want to understand AWS using the CLI
mel_27
2
170
大規模サービスにおける レガシーコードからReactへの移行
magicpod
1
160
Eight Engineering Unit 紹介資料
sansan33
PRO
1
6.9k
Featured
See All Featured
Build your cross-platform service in a week with App Engine
jlugia
234
18k
Future Trends and Review - Lecture 12 - Web Technologies (1019888BNR)
signer
PRO
0
3.3k
The AI Revolution Will Not Be Monopolized: How open-source beats economies of scale, even for LLMs
inesmontani
PRO
3
3.1k
Impact Scores and Hybrid Strategies: The future of link building
tamaranovitovic
0
220
What’s in a name? Adding method to the madness
productmarketing
PRO
24
4k
WCS-LA-2024
lcolladotor
0
470
Performance Is Good for Brains [We Love Speed 2024]
tammyeverts
12
1.5k
Creating an realtime collaboration tool: Agile Flush - .NET Oxford
marcduiker
35
2.4k
Scaling GitHub
holman
464
140k
JAMstack: Web Apps at Ludicrous Speed - All Things Open 2022
reverentgeek
1
380
The Limits of Empathy - UXLibs8
cassininazir
1
250
Effective software design: The role of men in debugging patriarchy in IT @ Voxxed Days AMS
baasie
0
250
Transcript
Evals for LLMApps/Agents Jehandad Kamal Mercari / Engineer @ AI/LLM
Team
Mercari AI Listing Support https://about.mercari.com/en/press/news/articles/20240910_aisupport/
Agents
Agent Development Loop
First SDLC for Agents ai.mercari.com
Types of Evals ( by technique ) ai.mercari.com
Types of Evals (by perspective )
Improved SDLC for Agents
Thank You!