Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Enhanced EC Recommendations: Trustworthy Valida...
Search
LINE Developers Taiwan
PRO
September 23, 2024
Technology
0
67
Enhanced EC Recommendations: Trustworthy Validation with Large Language Models for Two-Tower Model
Event: iThome Hello World Dev Conference
Speaker: Dan Chen
LINE Developers Taiwan
PRO
September 23, 2024
Tweet
Share
More Decks by LINE Developers Taiwan
See All by LINE Developers Taiwan
清大企業參訪- Ben
line_developers_tw
PRO
0
1
LLM 商品規格萃取大冒險- Vila
line_developers_tw
PRO
0
10
Playwright/MCP/AI -Winter
line_developers_tw
PRO
0
10
LINE EC Product Catalog Development- Rei
line_developers_tw
PRO
0
10
LINE 與 AI 機器人技術應用現況
line_developers_tw
PRO
0
7
QA Testing
line_developers_tw
PRO
0
4
jcconf_datadev_prod
line_developers_tw
PRO
0
7
jcconf_SPM_prod
line_developers_tw
PRO
0
5
jcconf_LINEPay_prod
line_developers_tw
PRO
0
4
Other Decks in Technology
See All in Technology
ソースを読む時の思考プロセスの例-MkDocs
sat
PRO
1
330
Okta Identity Governanceで実現する最小権限の原則
demaecan
0
200
serverless team topology
_kensh
3
240
ゼロコード計装導入後のカスタム計装でさらに可観測性を高めよう
sansantech
PRO
1
550
AI駆動で進める依存ライブラリ更新 ─ Vue プロジェクトの品質向上と開発スピード改善の実践録
sayn0
1
340
OTEPsで知るOpenTelemetryの未来 / Observability Conference Tokyo 2025
arthur1
0
330
実践マルチモーダル検索!
shibuiwilliam
1
410
AWS DMS で SQL Server を移行してみた/aws-dms-sql-server-migration
emiki
0
260
AIの個性を理解し、指揮する
shoota
3
470
データとAIで明らかになる、私たちの課題 ~Snowflake MCP,Salesforce MCPに触れて~ / Data and AI Insights
kaonavi
0
150
DMMの検索システムをSolrからElasticCloudに移行した話
hmaa_ryo
0
260
進化する大規模言語モデル評価: Swallowプロジェクトにおける実践と知見
chokkan
PRO
0
170
Featured
See All Featured
Scaling GitHub
holman
463
140k
Building Applications with DynamoDB
mza
96
6.7k
Rebuilding a faster, lazier Slack
samanthasiow
84
9.2k
Stop Working from a Prison Cell
hatefulcrawdad
272
21k
What’s in a name? Adding method to the madness
productmarketing
PRO
24
3.7k
Easily Structure & Communicate Ideas using Wireframe
afnizarnur
194
16k
"I'm Feeling Lucky" - Building Great Search Experiences for Today's Users (#IAC19)
danielanewman
230
22k
The Cult of Friendly URLs
andyhume
79
6.6k
[Rails World 2023 - Day 1 Closing Keynote] - The Magic of Rails
eileencodes
37
2.6k
The Psychology of Web Performance [Beyond Tellerrand 2023]
tammyeverts
49
3.1k
A Modern Web Designer's Workflow
chriscoyier
697
190k
Site-Speed That Sticks
csswizardry
13
930
Transcript
None
Enhanced EC Recommendations: Trustworthy Validation with Large Language Models for
Two-Tower Model EC Data Dev / Data Scientists Dan Chen
Dan LINE Taiwan EC Dev - Data Scientis Work Experience
Side Project
01 02 03 04 Evaluation Framework Offline & Online Evaluation
LLM on Recommendation What is Trustworthy 05 Q&A CONTENT
Why it’s so important 01 What is Trustworthy
Element of trustworthy 特點項目文字 特點項目 Trustworthy 特點項目文字 特點項目 特點項目文字 特點項目
Four Perspective 特點項目文字 特點項目 Trustworthy Recommendation 特點項目文字 特點項目 特點項目文字 特點項目
Data Preparation Data Representation Recommendation Generation Performance Evaluation
How to Correctly Evaluate AI 02 Evaluation Framework
Two - Stage Recommendation system Brickmaster Scalable Scenario-wise KPI -
Oriented Trustworthy
How to truly comprehensive understand performance Evaluation Framework (1/2)
How to truly comprehensive understand performance Evaluation Framework (1/2)
How to Correctly Evaluate AI 03 Offline & Online Evaluation
Key point to show how your algorithms can contribute to
your business Offline Evaluation
Key point to show how your algorithms can contribute to
your business Online Evaluation
Avoid pitfalls In Practice If experiment isn’t’ significant ?? Sample
ratio mismatch ?? Novelty effect ?? Key point to show how your algorithms can contribute to your business A/B test
Case – EC Shop recommendation
04 LLM On Recommendation
Recommendation with LLM - Feature Engineering: Text embedding generation -
How to evaluate embedding (probing): RankMe / α-ReQ Metrincs
Recommendation with LLM - Feature Engineering: Text embedding generation -
How to evaluate embedding (probing): RankMe / α-ReQ Metrincs
Evaluate & Challenge 05 Conclusion
Conclusion Business Value OpenAI, Claude, Gemini XGBoost or OpenSource 來源:https://zh.wikipedia.org/zh-
tw/%E7%BE%8E%E5%9C%8B%E9%9A%8A%E9%95%B72%EF%BC%9A%E9%85%B7%E5%AF%9 2%E6%88%B0%E5%A3%AB 來源:https://images.app.goo.gl/HCygtJVtoPaU2KgX6
Conclusion & Challenge 1. Data Quality 2. Multiple – Metrics
evaluation 3. Conduct A/B test Experiment 4. Human Perception Evaluation Challenge
Q&A 聯絡資訊 (Linkedin – Dan Chen)
None
None