Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
[mercari GEARS 2025] Evals for LLMApps/Agents
Search
mercari
PRO
November 14, 2025
Technology
100
0
Share
[mercari GEARS 2025] Evals for LLMApps/Agents
mercari
PRO
November 14, 2025
More Decks by mercari
See All by mercari
[DevDojo] Getting Started with BI: Looker Essentials - 2025
mercari
PRO
0
96
[DevDojo] Introduction to LLMs & AI Agents - 2025
mercari
PRO
0
130
[mercari GEARS 2025] Techniques for Reliable Code Generation Using AI Agents
mercari
PRO
0
250
[mercari GEARS 2025] Foundations of AI - The Invisible Forces Driving Product Innovation
mercari
PRO
0
230
[mercari GEARS 2025] Building Foundation for Mercari’s Global Expansion
mercari
PRO
1
390
[mercari GEARS 2025] The Past, Present, and Future of Anti-Phishing Measures at Mercari
mercari
PRO
0
120
[mercari GEARS 2025] Backend Standardization with MCP
mercari
PRO
0
180
[mercari GEARS 2025] Transforming customer engagement with Google Customer Engagement Suite
mercari
PRO
0
290
[mercari GEARS 2025] PJ Aurora’s Vision and Automated UI Quality Evaluation Agents
mercari
PRO
0
270
Other Decks in Technology
See All in Technology
20260516_SecJAWS_Days
takuyay0ne
2
410
Vision Banana: Image Generators are Generalist Vision Learners
kzykmyzw
0
380
サンプリングは「作る」のか「使う」のか? 分散トレースのコストと運用を両立する実践的戦略 / Why you need the tail sampling and why you don't want it
ymotongpoo
4
180
Purview Endpoint DLP 動かしてみた
kozakigh
0
390
20260515 ログイン機能だけではないアカウント管理を全体で考える~サービス設計者向け~
oidfj
0
400
バイブコーディング、仕様駆動、その先へ - 「不確実性に対する検査‧適応のサイクル」を設計する
littlehands
1
160
パーソルキャリア IT/テクノロジー職向け 会社紹介資料|Company Introduction Deck
techtekt
PRO
0
120
続 運用改善、不都合な真実 〜 物理制約のない運用改善はほとんど無価値 / 20260518-ssmjp-kaizen-no-value-without-physical-constraints
opelab
2
180
データモデリング通り #5オンライン勉強会: AIに『ビジネスの文脈』を教え込むデータモデリング
datayokocho
0
270
How to learn AWS Well-Architected with AWS BuilderCards: Security Edition
coosuke
PRO
0
140
Redmine次期バージョン7.0の注目新機能解説 — UI/UX強化と連携強化を中心に
vividtone
1
110
"スキルファースト"で作る、AIの自走環境
subroh0508
0
120
Featured
See All Featured
Paper Plane (Part 1)
katiecoart
PRO
0
7.5k
Navigating Team Friction
lara
192
16k
Java REST API Framework Comparison - PWX 2021
mraible
34
9.3k
JAMstack: Web Apps at Ludicrous Speed - All Things Open 2022
reverentgeek
1
440
Breaking role norms: Why Content Design is so much more than writing copy - Taylor Woolridge
uxyall
0
280
The Illustrated Children's Guide to Kubernetes
chrisshort
51
52k
The Pragmatic Product Professional
lauravandoore
37
7.3k
Designing for humans not robots
tammielis
254
26k
Navigating the Design Leadership Dip - Product Design Week Design Leaders+ Conference 2024
apolaine
0
310
Jess Joyce - The Pitfalls of Following Frameworks
techseoconnect
PRO
1
150
The Organizational Zoo: Understanding Human Behavior Agility Through Metaphoric Constructive Conversations (based on the works of Arthur Shelley, Ph.D)
kimpetersen
PRO
0
320
Become a Pro
speakerdeck
PRO
31
5.9k
Transcript
Evals for LLMApps/Agents Jehandad Kamal Mercari / Engineer @ AI/LLM
Team
Mercari AI Listing Support https://about.mercari.com/en/press/news/articles/20240910_aisupport/
Agents
Agent Development Loop
First SDLC for Agents ai.mercari.com
Types of Evals ( by technique ) ai.mercari.com
Types of Evals (by perspective )
Improved SDLC for Agents
Thank You!