Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
[mercari GEARS 2025] Evals for LLMApps/Agents
Search
Sponsored
·
Ship Features Fearlessly
Turn features on and off without deploys. Used by thousands of Ruby developers.
→
mercari
PRO
November 14, 2025
Technology
97
0
Share
[mercari GEARS 2025] Evals for LLMApps/Agents
mercari
PRO
November 14, 2025
More Decks by mercari
See All by mercari
[DevDojo] Getting Started with BI: Looker Essentials - 2025
mercari
PRO
0
86
[DevDojo] Introduction to LLMs & AI Agents - 2025
mercari
PRO
0
120
[mercari GEARS 2025] Techniques for Reliable Code Generation Using AI Agents
mercari
PRO
0
240
[mercari GEARS 2025] Foundations of AI - The Invisible Forces Driving Product Innovation
mercari
PRO
0
220
[mercari GEARS 2025] Building Foundation for Mercari’s Global Expansion
mercari
PRO
1
380
[mercari GEARS 2025] The Past, Present, and Future of Anti-Phishing Measures at Mercari
mercari
PRO
0
110
[mercari GEARS 2025] Backend Standardization with MCP
mercari
PRO
0
170
[mercari GEARS 2025] Transforming customer engagement with Google Customer Engagement Suite
mercari
PRO
0
270
[mercari GEARS 2025] PJ Aurora’s Vision and Automated UI Quality Evaluation Agents
mercari
PRO
0
250
Other Decks in Technology
See All in Technology
国内外の生成AIセキュリティの最新動向 & AIガードレール製品「chakoshi」のご紹介 / Latest Trends in Generative AI Security (Domestic & International) & Introduction to AI Guardrail Product "chakoshi"
nttcom
2
730
Chasing Real-Time Observability for CRuby
whitegreen
0
120
Azure Static Web Apps の自動ビルドがタイムアウトしやすくなった状況に対応した件/global-azure2026
thara0402
0
410
インターネットの技術 / Internet technology
ks91
PRO
0
210
みんなの「データ活用」を支えるストレージ担当から持ち込むAWS活用/コミュニティー設計TIPS 10選~「作れる」より、「続けられる」設計へ~
yoshiki0705
0
250
ワールドカフェI /チューターを改良する / World Café I and Improving the Tutors
ks91
PRO
0
320
Do Vibe Coding ao LLM em Produção para Busca Agêntica - TDC 2026 - Summit IA - São Paulo
jpbonson
3
120
サイボウズ 開発本部採用ピッチ / Cybozu Engineer Recruit
cybozuinsideout
PRO
10
78k
クラウドネイティブな開発 ~ 認知負荷に立ち向かうためのコンテナ活用
literalice
0
130
昔はシンプルだった_AmazonS3
kawaji_scratch
0
330
コードや知識を組み込む / Incorporate Code and Knowledge
ks91
PRO
0
150
Standards et agents IA : un tour d’horizon de MCP, A2A, ADK et plus encore
glaforge
0
170
Featured
See All Featured
Claude Code のすすめ
schroneko
67
220k
Why Our Code Smells
bkeepers
PRO
340
58k
How to Ace a Technical Interview
jacobian
281
24k
The Hidden Cost of Media on the Web [PixelPalooza 2025]
tammyeverts
2
270
[Rails World 2023 - Day 1 Closing Keynote] - The Magic of Rails
eileencodes
38
2.8k
I Don’t Have Time: Getting Over the Fear to Launch Your Podcast
jcasabona
34
2.7k
Imperfection Machines: The Place of Print at Facebook
scottboms
270
14k
個人開発の失敗を避けるイケてる考え方 / tips for indie hackers
panda_program
122
21k
Beyond borders and beyond the search box: How to win the global "messy middle" with AI-driven SEO
davidcarrasco
3
110
The Cost Of JavaScript in 2023
addyosmani
55
9.8k
Joys of Absence: A Defence of Solitary Play
codingconduct
1
350
Distributed Sagas: A Protocol for Coordinating Microservices
caitiem20
333
22k
Transcript
Evals for LLMApps/Agents Jehandad Kamal Mercari / Engineer @ AI/LLM
Team
Mercari AI Listing Support https://about.mercari.com/en/press/news/articles/20240910_aisupport/
Agents
Agent Development Loop
First SDLC for Agents ai.mercari.com
Types of Evals ( by technique ) ai.mercari.com
Types of Evals (by perspective )
Improved SDLC for Agents
Thank You!