Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Enhanced EC Recommendations: Trustworthy Valida...
Search
LINE Developers Taiwan
PRO
September 23, 2024
Technology
0
48
Enhanced EC Recommendations: Trustworthy Validation with Large Language Models for Two-Tower Model
Event: iThome Hello World Dev Conference
Speaker: Dan Chen
LINE Developers Taiwan
PRO
September 23, 2024
Tweet
Share
More Decks by LINE Developers Taiwan
See All by LINE Developers Taiwan
#Rookie’s Adventure: A 0 to 1 Dev Journey
line_developers_tw
PRO
0
20
LINE 購物幕後推手
line_developers_tw
PRO
0
610
從校園到職場 我的實習旅程
line_developers_tw
PRO
0
110
探索數據未來
line_developers_tw
PRO
0
16
MLE 的修煉之路
line_developers_tw
PRO
0
88
LINE 實習分享 & 國際黑客松參賽分享
line_developers_tw
PRO
0
47
在 GCP 運用 Parse 全家餐管理那堆 AI 應用的資料
line_developers_tw
PRO
0
41
40歲的我會給20歲的自己,關於軟體開發的7個建議
line_developers_tw
PRO
0
9.7k
從零到一:轉碼仔的實習攻略
line_developers_tw
PRO
0
71
Other Decks in Technology
See All in Technology
勘違いから始まったProxmox on ProxmoxでGPUパススルー【JPmoxs勉強会#7】/JPmoxs7_GPU_Passthrough_on_Proxmox_on_Proxmox-A_Journey_That_Started_with_a_Misunderstanding
tsukimi_site
0
130
スイッチのBMC、つかってますか?
sonic
0
510
encoding/json v2を予習しよう!
yuyu_hf
PRO
1
230
事業と組織から目を逸らずに技術でリードする
ogugu9
19
5.6k
KubeCon EU 2025 Recap - Kubernetes CRD Design for the Long Haul: Tips, Tricks, and Lessons Learned / Kubernetes Meetup Tokyo #70 / k8sjp70-crd-long-haul-recap
everpeace
0
110
技術的負債を「戦略的投資」にするためのPdMとエンジニアの連携と実践
satomino
4
860
エンジニアのための 法規制への取り組み方 #healthtechmeetup
77web
0
280
OCI Full Stack Disaster Recovery サービス概要
oracle4engineer
PRO
1
140
Azure の裏側を支える SRE の世界
tsubasaxzzz
2
370
AWS LambdaをTypeScriptで動かして分かった、Node.jsのTypeScriptサポートの利点と課題
smt7174
1
1.8k
変化に強いテーブル設計の勘所 / Table design that is resistant to changes
soudai
18
5k
本番環境への影響リスクが低い Real Application Testing (SQL Performance Analyzer) の実施方法の検討と実践
jri_narita
0
230
Featured
See All Featured
Save Time (by Creating Custom Rails Generators)
garrettdimon
PRO
31
1.2k
Evolution of real-time – Irina Nazarova, EuRuKo, 2024
irinanazarova
8
720
KATA
mclloyd
29
14k
The Web Performance Landscape in 2024 [PerfNow 2024]
tammyeverts
5
590
CoffeeScript is Beautiful & I Never Want to Write Plain JavaScript Again
sstephenson
160
15k
Site-Speed That Sticks
csswizardry
6
570
A Modern Web Designer's Workflow
chriscoyier
693
190k
Become a Pro
speakerdeck
PRO
28
5.3k
Distributed Sagas: A Protocol for Coordinating Microservices
caitiem20
331
21k
The MySQL Ecosystem @ GitHub 2015
samlambert
251
12k
Building Flexible Design Systems
yeseniaperezcruz
329
39k
The Art of Programming - Codeland 2020
erikaheidi
54
13k
Transcript
None
Enhanced EC Recommendations: Trustworthy Validation with Large Language Models for
Two-Tower Model EC Data Dev / Data Scientists Dan Chen
Dan LINE Taiwan EC Dev - Data Scientis Work Experience
Side Project
01 02 03 04 Evaluation Framework Offline & Online Evaluation
LLM on Recommendation What is Trustworthy 05 Q&A CONTENT
Why it’s so important 01 What is Trustworthy
Element of trustworthy 特點項目文字 特點項目 Trustworthy 特點項目文字 特點項目 特點項目文字 特點項目
Four Perspective 特點項目文字 特點項目 Trustworthy Recommendation 特點項目文字 特點項目 特點項目文字 特點項目
Data Preparation Data Representation Recommendation Generation Performance Evaluation
How to Correctly Evaluate AI 02 Evaluation Framework
Two - Stage Recommendation system Brickmaster Scalable Scenario-wise KPI -
Oriented Trustworthy
How to truly comprehensive understand performance Evaluation Framework (1/2)
How to truly comprehensive understand performance Evaluation Framework (1/2)
How to Correctly Evaluate AI 03 Offline & Online Evaluation
Key point to show how your algorithms can contribute to
your business Offline Evaluation
Key point to show how your algorithms can contribute to
your business Online Evaluation
Avoid pitfalls In Practice If experiment isn’t’ significant ?? Sample
ratio mismatch ?? Novelty effect ?? Key point to show how your algorithms can contribute to your business A/B test
Case – EC Shop recommendation
04 LLM On Recommendation
Recommendation with LLM - Feature Engineering: Text embedding generation -
How to evaluate embedding (probing): RankMe / α-ReQ Metrincs
Recommendation with LLM - Feature Engineering: Text embedding generation -
How to evaluate embedding (probing): RankMe / α-ReQ Metrincs
Evaluate & Challenge 05 Conclusion
Conclusion Business Value OpenAI, Claude, Gemini XGBoost or OpenSource 來源:https://zh.wikipedia.org/zh-
tw/%E7%BE%8E%E5%9C%8B%E9%9A%8A%E9%95%B72%EF%BC%9A%E9%85%B7%E5%AF%9 2%E6%88%B0%E5%A3%AB 來源:https://images.app.goo.gl/HCygtJVtoPaU2KgX6
Conclusion & Challenge 1. Data Quality 2. Multiple – Metrics
evaluation 3. Conduct A/B test Experiment 4. Human Perception Evaluation Challenge
Q&A 聯絡資訊 (Linkedin – Dan Chen)
None
None