Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
[mercari GEARS 2025] Evals for LLMApps/Agents
Search
mercari
PRO
November 14, 2025
Technology
0
57
[mercari GEARS 2025] Evals for LLMApps/Agents
mercari
PRO
November 14, 2025
Tweet
Share
More Decks by mercari
See All by mercari
[DevDojo] Getting Started with BI: Looker Essentials - 2025
mercari
PRO
0
9
[DevDojo] Introduction to LLMs & AI Agents - 2025
mercari
PRO
0
3
[mercari GEARS 2025] Techniques for Reliable Code Generation Using AI Agents
mercari
PRO
0
45
[mercari GEARS 2025] Foundations of AI - The Invisible Forces Driving Product Innovation
mercari
PRO
0
64
[mercari GEARS 2025] Building Foundation for Mercari’s Global Expansion
mercari
PRO
1
180
[mercari GEARS 2025] The Past, Present, and Future of Anti-Phishing Measures at Mercari
mercari
PRO
0
43
[mercari GEARS 2025] Backend Standardization with MCP
mercari
PRO
0
60
[mercari GEARS 2025] Transforming customer engagement with Google Customer Engagement Suite
mercari
PRO
0
57
[mercari GEARS 2025] PJ Aurora’s Vision and Automated UI Quality Evaluation Agents
mercari
PRO
0
42
Other Decks in Technology
See All in Technology
重厚長大企業で、顧客価値をスケールさせるためのプロダクトづくりとプロダクト開発チームづくりの裏側 / Developers X Summit 2025
mongolyy
0
220
Contract One Engineering Unit 紹介資料
sansan33
PRO
0
9.7k
Introduction to Sansan, inc / Sansan Global Development Center, Inc.
sansan33
PRO
0
2.9k
【保存版】「ガチャ」からの脱却:Gemini × Veoで作る、意図を反映するAI動画制作ワークフロー
nekoailab
0
110
Introduction to Sansan for Engineers / エンジニア向け会社紹介
sansan33
PRO
5
45k
ローカルLLM基礎知識 / local LLM basics 2025
kishida
25
11k
Bedrock のコスト監視設計
fohte
2
250
[続・営業向け 誰でも話せるOCI セールストーク] AWSよりOCIの優位性が分からない編(2025年11月21日開催)
oracle4engineer
PRO
1
150
プロダクト負債と歩む持続可能なサービスを育てるための挑戦
sansantech
PRO
1
1.1k
AI時代のインシデント対応 〜時代を切り抜ける、組織アーキテクチャ〜
jacopen
4
170
adk-samples に学ぶデータ分析 LLM エージェント開発
na0
3
850
PostgreSQL で列データ”ファイル”を利用する ~Arrow/Parquet を統合したデータベースの作成~
kaigai
0
180
Featured
See All Featured
Creating an realtime collaboration tool: Agile Flush - .NET Oxford
marcduiker
34
2.3k
Save Time (by Creating Custom Rails Generators)
garrettdimon
PRO
32
1.8k
What's in a price? How to price your products and services
michaelherold
246
12k
How Fast Is Fast Enough? [PerfNow 2025]
tammyeverts
3
350
I Don’t Have Time: Getting Over the Fear to Launch Your Podcast
jcasabona
34
2.5k
It's Worth the Effort
3n
187
29k
Visualizing Your Data: Incorporating Mongo into Loggly Infrastructure
mongodb
48
9.8k
Put a Button on it: Removing Barriers to Going Fast.
kastner
60
4.1k
Building a Scalable Design System with Sketch
lauravandoore
463
34k
YesSQL, Process and Tooling at Scale
rocio
174
15k
Improving Core Web Vitals using Speculation Rules API
sergeychernyshev
21
1.3k
Leading Effective Engineering Teams in the AI Era
addyosmani
8
1.2k
Transcript
Evals for LLMApps/Agents Jehandad Kamal Mercari / Engineer @ AI/LLM
Team
Mercari AI Listing Support https://about.mercari.com/en/press/news/articles/20240910_aisupport/
Agents
Agent Development Loop
First SDLC for Agents ai.mercari.com
Types of Evals ( by technique ) ai.mercari.com
Types of Evals (by perspective )
Improved SDLC for Agents
Thank You!