Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
[mercari GEARS 2025] Evals for LLMApps/Agents
Search
Sponsored
·
Ship Features Fearlessly
Turn features on and off without deploys. Used by thousands of Ruby developers.
→
mercari
PRO
November 14, 2025
Technology
0
92
[mercari GEARS 2025] Evals for LLMApps/Agents
mercari
PRO
November 14, 2025
Tweet
Share
More Decks by mercari
See All by mercari
[DevDojo] Getting Started with BI: Looker Essentials - 2025
mercari
PRO
0
75
[DevDojo] Introduction to LLMs & AI Agents - 2025
mercari
PRO
0
100
[mercari GEARS 2025] Techniques for Reliable Code Generation Using AI Agents
mercari
PRO
0
220
[mercari GEARS 2025] Foundations of AI - The Invisible Forces Driving Product Innovation
mercari
PRO
0
200
[mercari GEARS 2025] Building Foundation for Mercari’s Global Expansion
mercari
PRO
1
360
[mercari GEARS 2025] The Past, Present, and Future of Anti-Phishing Measures at Mercari
mercari
PRO
0
100
[mercari GEARS 2025] Backend Standardization with MCP
mercari
PRO
0
150
[mercari GEARS 2025] Transforming customer engagement with Google Customer Engagement Suite
mercari
PRO
0
230
[mercari GEARS 2025] PJ Aurora’s Vision and Automated UI Quality Evaluation Agents
mercari
PRO
0
210
Other Decks in Technology
See All in Technology
CloudFrontのHost Header転送設定でパケットの中身はどう変わるのか?
nagisa53
1
190
【社内勉強会】新年度からコーディングエージェントを使いこなす - 構造と制約で引き出すClaude Codeの実践知
nwiizo
24
12k
FastMCP OAuth Proxy with Cognito
hironobuiga
3
210
TUNA Camp 2026 京都Stage ヒューリスティックアルゴリズム入門
terryu16
0
510
Phase11_戦略的AI経営
overflowinc
0
1.6k
JEDAI認定プログラム JEDAI Order 2026 受賞者一覧 / JEDAI Order 2026 Winners
databricksjapan
0
350
Blue/Green Deployment を用いた PostgreSQL のメジャーバージョンアップ
kkato1
0
140
俺の/私の最強アーキテクチャ決定戦開催 ― チームで新しいアーキテクチャに適合していくために / 20260322 Naoki Takahashi
shift_evolve
PRO
1
450
How to install a gem
indirect
0
1.6k
スピンアウト講座02_ファイル管理
overflowinc
0
1.4k
スピンアウト講座05_実践活用事例
overflowinc
0
1.3k
Bref でサービスを運用している話
sgash708
0
200
Featured
See All Featured
Mozcon NYC 2025: Stop Losing SEO Traffic
samtorres
0
190
Discover your Explorer Soul
emna__ayadi
2
1.1k
Producing Creativity
orderedlist
PRO
348
40k
A better future with KSS
kneath
240
18k
Cheating the UX When There Is Nothing More to Optimize - PixelPioneers
stephaniewalter
287
14k
Navigating the Design Leadership Dip - Product Design Week Design Leaders+ Conference 2024
apolaine
0
240
How To Speak Unicorn (iThemes Webinar)
marktimemedia
1
410
Understanding Cognitive Biases in Performance Measurement
bluesmoon
32
2.8k
Performance Is Good for Brains [We Love Speed 2024]
tammyeverts
12
1.5k
Exploring the relationship between traditional SERPs and Gen AI search
raygrieselhuber
PRO
2
3.7k
The Illustrated Guide to Node.js - THAT Conference 2024
reverentgeek
1
320
How to Create Impact in a Changing Tech Landscape [PerfNow 2023]
tammyeverts
55
3.3k
Transcript
Evals for LLMApps/Agents Jehandad Kamal Mercari / Engineer @ AI/LLM
Team
Mercari AI Listing Support https://about.mercari.com/en/press/news/articles/20240910_aisupport/
Agents
Agent Development Loop
First SDLC for Agents ai.mercari.com
Types of Evals ( by technique ) ai.mercari.com
Types of Evals (by perspective )
Improved SDLC for Agents
Thank You!