Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Enhanced EC Recommendations: Trustworthy Valida...
Search
LINE Developers Taiwan
PRO
September 23, 2024
Technology
0
45
Enhanced EC Recommendations: Trustworthy Validation with Large Language Models for Two-Tower Model
Event: iThome Hello World Dev Conference
Speaker: Dan Chen
LINE Developers Taiwan
PRO
September 23, 2024
Tweet
Share
More Decks by LINE Developers Taiwan
See All by LINE Developers Taiwan
從校園到職場 我的實習旅程
line_developers_tw
PRO
0
100
探索數據未來
line_developers_tw
PRO
0
14
MLE 的修煉之路
line_developers_tw
PRO
0
84
LINE 實習分享 & 國際黑客松參賽分享
line_developers_tw
PRO
0
42
在 GCP 運用 Parse 全家餐管理那堆 AI 應用的資料
line_developers_tw
PRO
0
38
40歲的我會給20歲的自己,關於軟體開發的7個建議
line_developers_tw
PRO
0
9.4k
從零到一:轉碼仔的實習攻略
line_developers_tw
PRO
0
68
如何在團隊發揮數據影響力: 以電商資料科學家為例
line_developers_tw
PRO
1
62
做Data超讚的 誰懂?
line_developers_tw
PRO
0
50
Other Decks in Technology
See All in Technology
LiteXとオレオレCPUで作る自作SoC奮闘記
msyksphinz
0
850
QA/SDETの現在と、これからの挑戦
imtnd
0
150
ビジネスとデザインとエンジニアリングを繋ぐために 一人のエンジニアは何ができるか / What can a single engineer do to connect business, design, and engineering?
kaminashi
2
740
今日からはじめるプラットフォームエンジニアリング
jacopen
8
1.8k
コスト最適重視でAurora PostgreSQLのログ分析基盤を作ってみた #jawsug_tokyo
non97
1
790
白金鉱業Meetup_Vol.18_AIエージェント時代のUI/UX設計
brainpadpr
1
240
生成AIによるCloud Native基盤構築の可能性と実践的ガードレールの敷設について
nwiizo
7
1.3k
AIエージェント開発手法と業務導入のプラクティス
ykosaka
9
2.4k
地味にいろいろあった! 2025春のAmazon Bedrockアップデートおさらい
minorun365
PRO
1
510
3D生成AIのための画像生成
kosukeito
2
360
ドキュメント管理の理想と現実
kazuhe
1
260
Databricksで完全履修!オールインワンレイクハウスは実在した!
akuwano
0
120
Featured
See All Featured
I Don’t Have Time: Getting Over the Fear to Launch Your Podcast
jcasabona
32
2.3k
Git: the NoSQL Database
bkeepers
PRO
430
65k
A designer walks into a library…
pauljervisheath
205
24k
Cheating the UX When There Is Nothing More to Optimize - PixelPioneers
stephaniewalter
280
13k
CoffeeScript is Beautiful & I Never Want to Write Plain JavaScript Again
sstephenson
160
15k
The Pragmatic Product Professional
lauravandoore
33
6.6k
How GitHub (no longer) Works
holman
314
140k
Embracing the Ebb and Flow
colly
85
4.7k
Side Projects
sachag
453
42k
Optimising Largest Contentful Paint
csswizardry
37
3.2k
Improving Core Web Vitals using Speculation Rules API
sergeychernyshev
13
810
4 Signs Your Business is Dying
shpigford
183
22k
Transcript
None
Enhanced EC Recommendations: Trustworthy Validation with Large Language Models for
Two-Tower Model EC Data Dev / Data Scientists Dan Chen
Dan LINE Taiwan EC Dev - Data Scientis Work Experience
Side Project
01 02 03 04 Evaluation Framework Offline & Online Evaluation
LLM on Recommendation What is Trustworthy 05 Q&A CONTENT
Why it’s so important 01 What is Trustworthy
Element of trustworthy 特點項目文字 特點項目 Trustworthy 特點項目文字 特點項目 特點項目文字 特點項目
Four Perspective 特點項目文字 特點項目 Trustworthy Recommendation 特點項目文字 特點項目 特點項目文字 特點項目
Data Preparation Data Representation Recommendation Generation Performance Evaluation
How to Correctly Evaluate AI 02 Evaluation Framework
Two - Stage Recommendation system Brickmaster Scalable Scenario-wise KPI -
Oriented Trustworthy
How to truly comprehensive understand performance Evaluation Framework (1/2)
How to truly comprehensive understand performance Evaluation Framework (1/2)
How to Correctly Evaluate AI 03 Offline & Online Evaluation
Key point to show how your algorithms can contribute to
your business Offline Evaluation
Key point to show how your algorithms can contribute to
your business Online Evaluation
Avoid pitfalls In Practice If experiment isn’t’ significant ?? Sample
ratio mismatch ?? Novelty effect ?? Key point to show how your algorithms can contribute to your business A/B test
Case – EC Shop recommendation
04 LLM On Recommendation
Recommendation with LLM - Feature Engineering: Text embedding generation -
How to evaluate embedding (probing): RankMe / α-ReQ Metrincs
Recommendation with LLM - Feature Engineering: Text embedding generation -
How to evaluate embedding (probing): RankMe / α-ReQ Metrincs
Evaluate & Challenge 05 Conclusion
Conclusion Business Value OpenAI, Claude, Gemini XGBoost or OpenSource 來源:https://zh.wikipedia.org/zh-
tw/%E7%BE%8E%E5%9C%8B%E9%9A%8A%E9%95%B72%EF%BC%9A%E9%85%B7%E5%AF%9 2%E6%88%B0%E5%A3%AB 來源:https://images.app.goo.gl/HCygtJVtoPaU2KgX6
Conclusion & Challenge 1. Data Quality 2. Multiple – Metrics
evaluation 3. Conduct A/B test Experiment 4. Human Perception Evaluation Challenge
Q&A 聯絡資訊 (Linkedin – Dan Chen)
None
None