龐⼤的數據量,想問⼀下 LINE 通常都如何應⽤ 這些⼤量的數據呢︖ 想請問 Data dev 部⾨對於資料分析的開發策略是什麼︖ ⽬前是針對什麼項⽬進⾏主⼒分析︖ 如果想進⼊ LINE Taiwan ⼯作需要具備哪些特質或是能⼒︖ 是否與在台灣的組織⽬標有關係︖ 會給予想要進⼊ LINE Taiwan ⼯作的⼤學⽣, 什麼樣⼦的建議呢︖(⼼態、軟硬實⼒) 請問⼯作下來感受到 LINE Taiwan 的公司⽂化是什麼樣⼦︖
TODAY LINE SHOPPING LINE SPOT LINE MUSIC LINE STICKER LINE VOOM LINE Reward Official Account Fact Checker LINE HELP TW LINE TRAVEL Ads 獨立的資料⼯程部⾨,提供資料科學解決⽅案 LINE TODAY 甲方中的乙方:接收各服務需求、同時也打造自己的產品!
MUSIC LINE STICKER LINE VOOM LINE Reward Fact Checker LINE HELP TW LINE TRAVEL NLP Knowledge Graph MarTech NER Classifier Duplication Detector Auto completion Keyword Extraction Related Search Text Generation User Tagging Data Analytics Recom- mendation RFM CLV 以NLP與MarTech應⽤實現服務優化 LINE TODAY Uplift Modeling 廣告/推薦/搜尋
architectur e • Assemble large, com plex data sets that m eet requirements Data Engineer Data Analyst Big data infra, SQL, ET L, message queuing • Interpret data, analyz e results using statisti cal techniques • Identify, analyze, and interpret trends or pat terns in complex data sets Statistics, Data Visualiz ation, Business Knowle dge SKILL RESPONSIBILITY • Select appropriate da tasets and data repre sentation methods • Research and imple ment appropriate ML algorithms Data Scientist Machine learning, deep learning, CV, NLP, Spe ech ML Svc Engineer • Build and scale mach ine learning infrastruc ture • Monitor model perfor mance System infrastructure d esign, DevOps
architectur e • Assemble large, com plex data sets that m eet requirements Data Engineer Data Analyst Big data infra, SQL, ET L, message queuing • Interpret data, analyz e results using statisti cal techniques • Identify, analyze, and interpret trends or pat terns in complex data sets Statistics, Data Visualiz ation, Business Knowle dge SKILL RESPONSIBILITY Pipeline Biz • Select appropriate da tasets and data repre sentation methods • Research and imple ment appropriate ML algorithms Data Scientist Machine learning, deep learning, CV, NLP, Spe ech Model ML Svc Engineer • Build and scale mach ine learning infrastruc ture • Monitor model perfor mance System infrastructure d esign, DevOps Service
DE DA MSE EDA Model build Hyper-parameter tuning Evaluation Feature Engineering Error analysis Scaling Performance Model decay Data drift Feature Model Data Label Reliability Biz analysis ML Workflow 進到職場後: ? Biz problem
DE DA MSE EDA Model build Hyper-parameter tuning Evaluation Feature Engineering Error analysis Scaling Performance Model decay Data drift Feature Model Data Label Reliability Biz analysis ? Biz problem NLP應用專案 以 LINE 購物相關搜尋為例 • 資料研究與選擇 • 資料量與效能 • 資料處理 Data Label
DE DA MSE • 線下與線上測試 • 不只看 [統計指標] 更要看 [商業指標] ? EDA Model build Hyper-parameter tuning Evaluation Feature Engineering Error analysis Scaling Performance Model decay Data drift Feature Model Data Label Reliability Biz analysis NLP應用專案 以 LINE 購物相關搜尋為例 Biz problem
DE DA MSE Scaling Performance • 批量預測或即時預測 • 更新頻率 ? EDA Model build Hyper-parameter tuning Evaluation Feature Engineering Error analysis Model decay Data drift Feature Model Data Label Reliability Biz analysis NLP應用專案 以 LINE 購物相關搜尋為例 Biz problem
DE DA MSE ? EDA Model build Hyper-parameter tuning Evaluation Feature Engineering Error analysis Scaling Performance Model decay Data drift Feature Model Data Label Reliability Biz analysis NLP應用專案 以 LINE 購物相關搜尋為例 • 商業指標: CTR/CVR • 服務健康指標: SLO Biz problem
DE DA MSE ? Biz problem Key metrics How to use 優化關鍵字推薦 Model: Hit rate Model API NLP模型設計 EDA Model build Hyper-parameter tuning Evaluation Feature Engineering Error analysis Scaling Performance Model decay Data drift Feature Model Data Label Reliability Biz analysis NLP應用專案 以 LINE 購物相關搜尋為例 LINE購物 歷史搜尋紀錄 Biz: CTR/CVR/Steps Key metrics