Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
[mercari GEARS 2025] Evals for LLMApps/Agents
Search
mercari
PRO
November 14, 2025
Technology
0
73
[mercari GEARS 2025] Evals for LLMApps/Agents
mercari
PRO
November 14, 2025
Tweet
Share
More Decks by mercari
See All by mercari
[DevDojo] Getting Started with BI: Looker Essentials - 2025
mercari
PRO
0
41
[DevDojo] Introduction to LLMs & AI Agents - 2025
mercari
PRO
0
47
[mercari GEARS 2025] Techniques for Reliable Code Generation Using AI Agents
mercari
PRO
0
74
[mercari GEARS 2025] Foundations of AI - The Invisible Forces Driving Product Innovation
mercari
PRO
0
110
[mercari GEARS 2025] Building Foundation for Mercari’s Global Expansion
mercari
PRO
1
270
[mercari GEARS 2025] The Past, Present, and Future of Anti-Phishing Measures at Mercari
mercari
PRO
0
69
[mercari GEARS 2025] Backend Standardization with MCP
mercari
PRO
0
92
[mercari GEARS 2025] Transforming customer engagement with Google Customer Engagement Suite
mercari
PRO
0
130
[mercari GEARS 2025] PJ Aurora’s Vision and Automated UI Quality Evaluation Agents
mercari
PRO
0
110
Other Decks in Technology
See All in Technology
AI: The stuff that nobody shows you
jnunemaker
PRO
1
130
「もしもデータ基盤開発で『強くてニューゲーム』ができたなら今の僕はどんなデータ基盤を作っただろう」
aeonpeople
0
280
First-Principles-of-Scrum
hiranabe
1
230
コールドスタンバイ構成でCDは可能か
hiramax
0
130
AI駆動開発ライフサイクル(AI-DLC)の始め方
ryansbcho79
0
280
小さく、早く、可能性を多産する。生成AIプロジェクト / prAIrie-dog
visional_engineering_and_design
0
300
Claude Codeを使った情報整理術
knishioka
15
11k
I tried making a solo advent calendar!
zzzzico
0
120
投資戦略を量産せよ 2 - マケデコセミナー(2025/12/26)
gamella
0
570
産業的変化も組織的変化も乗り越えられるチームへの成長 〜チームの変化から見出す明るい未来〜
kakehashi
PRO
0
130
"人"が頑張るAI駆動開発
yokomachi
1
670
あの夜、私たちは「人間」に戻った。 ── 災害ユートピア、贈与、そしてアジャイルの再構築 / 20260108 Hiromitsu Akiba
shift_evolve
PRO
0
250
Featured
See All Featured
Design and Strategy: How to Deal with People Who Don’t "Get" Design
morganepeng
132
19k
Let's Do A Bunch of Simple Stuff to Make Websites Faster
chriscoyier
508
140k
Visualization
eitanlees
150
16k
Digital Projects Gone Horribly Wrong (And the UX Pros Who Still Save the Day) - Dean Schuster
uxyall
0
110
Test your architecture with Archunit
thirion
1
2.1k
Odyssey Design
rkendrick25
PRO
0
450
HU Berlin: Industrial-Strength Natural Language Processing with spaCy and Prodigy
inesmontani
PRO
0
120
The Mindset for Success: Future Career Progression
greggifford
PRO
0
200
Product Roadmaps are Hard
iamctodd
PRO
55
12k
Everyday Curiosity
cassininazir
0
120
How To Speak Unicorn (iThemes Webinar)
marktimemedia
1
360
The Art of Delivering Value - GDevCon NA Keynote
reverentgeek
16
1.8k
Transcript
Evals for LLMApps/Agents Jehandad Kamal Mercari / Engineer @ AI/LLM
Team
Mercari AI Listing Support https://about.mercari.com/en/press/news/articles/20240910_aisupport/
Agents
Agent Development Loop
First SDLC for Agents ai.mercari.com
Types of Evals ( by technique ) ai.mercari.com
Types of Evals (by perspective )
Improved SDLC for Agents
Thank You!