Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Enhanced EC Recommendations: Trustworthy Valida...
Search
LINE Developers Taiwan
PRO
September 23, 2024
Technology
0
5
Enhanced EC Recommendations: Trustworthy Validation with Large Language Models for Two-Tower Model
Event: iThome Hello World Dev Conference
Speaker: Dan Chen
LINE Developers Taiwan
PRO
September 23, 2024
Tweet
Share
More Decks by LINE Developers Taiwan
See All by LINE Developers Taiwan
Scaling The E-Commerce Recommendation System
line_developers_tw
PRO
0
10
揭秘LLMOps: 讓LLM服務像火箭 般穩定高效的祕密!
line_developers_tw
PRO
0
27
ML Life Cycle for LINE SHOPPING Recommender
line_developers_tw
PRO
0
9
Review AI from LINE EC NLP
line_developers_tw
PRO
0
6
LINE購物 App x ATDD: 利用 ATDD 改善開發流程
line_developers_tw
PRO
0
19
Grafana Alloy Best Practice
line_developers_tw
PRO
0
1.1k
Distributed Tracing in LINE Taiwan
line_developers_tw
PRO
0
36
只有 Status page 還不夠!講人話才知道 Infra 發生什麼事
line_developers_tw
PRO
2
300
LINE Chatbot 的終極進化:如何使用 Gemini、多模態和 Gemma 突破對話式 AI 的界限
line_developers_tw
PRO
0
460
Other Decks in Technology
See All in Technology
CVE alive
ennael
PRO
0
360
【shownet.conf_】革新と伝統を融合したファシリティ
shownet
PRO
0
240
テストコードの品質を客観的な数値で担保しよう〜Mutation Testのすすめ〜
ysknsid25
2
120
k6を活用した再現性・拡張性の高い負荷試験基盤の構築
biwashi
11
2.9k
MLOpsの「あるある」課題の解決と、そのためのライブラリgokart
mski_iksm
1
150
Hazard pointers with reference counter
ennael
PRO
0
110
リスクから学ぶKubernetesコンテナセキュリティ/k8s-risk-and-security
mochizuki875
1
260
【shownet.conf_】クロージングセッション
shownet
PRO
0
220
【shownet.conf_】ShowNet x 宇宙ネットワーク
shownet
PRO
0
300
Consoles, printk, Nested-NMIs_ Oh my!
ennael
PRO
0
160
不感対策ソリューション
jtes
0
230
分析者起点の企画を成功させた連携面の工夫
lycorptech_jp
PRO
1
220
Featured
See All Featured
Bootstrapping a Software Product
garrettdimon
PRO
304
110k
How to Ace a Technical Interview
jacobian
275
23k
Web Components: a chance to create the future
zenorocha
310
42k
Designing the Hi-DPI Web
ddemaree
279
34k
The World Runs on Bad Software
bkeepers
PRO
65
11k
A Modern Web Designer's Workflow
chriscoyier
692
190k
Build The Right Thing And Hit Your Dates
maggiecrowley
31
2.3k
Sharpening the Axe: The Primacy of Toolmaking
bcantrill
37
1.7k
Visualizing Your Data: Incorporating Mongo into Loggly Infrastructure
mongodb
41
9.2k
Reflections from 52 weeks, 52 projects
jeffersonlam
346
20k
Statistics for Hackers
jakevdp
796
220k
The Invisible Side of Design
smashingmag
297
50k
Transcript
None
Enhanced EC Recommendations: Trustworthy Validation with Large Language Models for
Two-Tower Model EC Data Dev / Data Scientists Dan Chen
Dan LINE Taiwan EC Dev - Data Scientis Work Experience
Side Project
01 02 03 04 Evaluation Framework Offline & Online Evaluation
LLM on Recommendation What is Trustworthy 05 Q&A CONTENT
Why it’s so important 01 What is Trustworthy
Element of trustworthy 特點項目文字 特點項目 Trustworthy 特點項目文字 特點項目 特點項目文字 特點項目
Four Perspective 特點項目文字 特點項目 Trustworthy Recommendation 特點項目文字 特點項目 特點項目文字 特點項目
Data Preparation Data Representation Recommendation Generation Performance Evaluation
How to Correctly Evaluate AI 02 Evaluation Framework
Two - Stage Recommendation system Brickmaster Scalable Scenario-wise KPI -
Oriented Trustworthy
How to truly comprehensive understand performance Evaluation Framework (1/2)
How to truly comprehensive understand performance Evaluation Framework (1/2)
How to Correctly Evaluate AI 03 Offline & Online Evaluation
Key point to show how your algorithms can contribute to
your business Offline Evaluation
Key point to show how your algorithms can contribute to
your business Online Evaluation
Avoid pitfalls In Practice If experiment isn’t’ significant ?? Sample
ratio mismatch ?? Novelty effect ?? Key point to show how your algorithms can contribute to your business A/B test
Case – EC Shop recommendation
04 LLM On Recommendation
Recommendation with LLM - Feature Engineering: Text embedding generation -
How to evaluate embedding (probing): RankMe / α-ReQ Metrincs
Recommendation with LLM - Feature Engineering: Text embedding generation -
How to evaluate embedding (probing): RankMe / α-ReQ Metrincs
Evaluate & Challenge 05 Conclusion
Conclusion Business Value OpenAI, Claude, Gemini XGBoost or OpenSource 來源:https://zh.wikipedia.org/zh-
tw/%E7%BE%8E%E5%9C%8B%E9%9A%8A%E9%95%B72%EF%BC%9A%E9%85%B7%E5%AF%9 2%E6%88%B0%E5%A3%AB 來源:https://images.app.goo.gl/HCygtJVtoPaU2KgX6
Conclusion & Challenge 1. Data Quality 2. Multiple – Metrics
evaluation 3. Conduct A/B test Experiment 4. Human Perception Evaluation Challenge
Q&A 聯絡資訊 (Linkedin – Dan Chen)
None
None