Upgrade to Pro — share decks privately, control downloads, hide ads and more …

How Does LINE Implement Cross-Service Data Util...

How Does LINE Implement Cross-Service Data Utilization?

LINE DevDay 2020

November 27, 2020
Tweet

More Decks by LINE DevDay 2020

Other Decks in Technology

Transcript

  1. Data Science & Engineering Center Data Science & Engineering Data

    Management Data Platform Data Labs Engineering Infrastructure Data Governance Data Strategy Inquiry Management Business Consulting Data Product Management Data ETL Data Engineering IU Dev Data Solutions Cloudera PS/PSE Data Science 1-4 Machine Learning 1-2 DSP ML OCR Voice Speech NLP Speech & Voice Planning SET Delivery Infra Observability Infra
  2. Information Universe (IU) The data platform at LINE HDFS:// s3a://

    POSIX filesystem YARN Container Docker Container Distributed system
  3. Information Universe (IU) The data platform at LINE HDFS:// s3a://

    POSIX filesystem YARN Container Docker Container Distributed system Execution engine Read data Write data
  4. Information Universe (IU) The data platform at LINE HDFS:// s3a://

    POSIX filesystem YARN Container Docker Container Distributed system Execution engine Read data Write data External data source Export to Collect data
  5. Information Universe (IU) The data platform at LINE HDFS:// s3a://

    POSIX filesystem YARN Container Docker Container Distributed system Execution engine Read data Write data External data source Export to Collect data Business intelligence
  6. Cross-Domain Recommendation › Timeline Discover › Use various features obtained

    from other LINE Family services (News, Live, etc.) › LINE Theme Recommendation › Utilize sticker purchase log › Smart Channel › Leverage feedbacks from multiple domains to improve recommendation performance Timeline Discover Theme Recommendation Smart Channel
  7. Smart Channel › Display recommended content of various services and

    advertisements › Weather › Fortune › News › Sticker › Theme › Manga › Music › Point › Search › Local Safety › Train Delay › Lottery
  8. Where do these contents come from? Smart Channel Service A

    First-stage Recommendation Recommendation for User A Service B News articles Sticker Fortune Service C
  9. Where do these contents come from? Smart Channel Service A

    First-stage Recommendation Recommendation for User A Service B News articles Sticker Fortune Service C CRS Engine Second-stage Cross-Domain Recommendation targeting scoring filtering Only a subset of items passes User A 35-39 male Feedback
  10. CRS Engine Available Features User Segment / Preference Estimated from

    z-features (user features) Contextual Bandits Algorithm to maximize the rewards. As contexts change, the model should adapt its bandit choice. Cross-domain User / Item Embedding Learn node embeddings in an online manner. 35-39 male fond of music User Manga News Sticker Music User News Sticker
  11. CRS Engine Available Features User Segment / Preference Estimated from

    z-features (user features) Contextual Bandits Algorithm to maximize the rewards. As contexts change, the model should adapt its bandit choice. Cross-domain User / Item Embedding Learn node embeddings in an online manner. 35-39 male fond of music User Manga News Sticker Music User News Sticker
  12. CRS Engine Available Features User Segment / Preference Estimated from

    z-features (user features) Contextual Bandits Algorithm to maximize the rewards. As contexts change, the model should adapt its bandit choice. Cross-domain User / Item Embedding Learn node embeddings in an online manner. 35-39 male fond of music User Manga News Sticker Music User News Sticker
  13. CRS Engine Available Features User Segment / Preference Estimated from

    z-features (user features) Contextual Bandits Algorithm to maximize the rewards. As contexts change, the model should adapt its bandit choice. Cross-domain User / Item Embedding Learn node embeddings in an online manner. 35-39 male fond of music User Manga News Sticker Music User News Sticker
  14. Case: Free Stickers 1st Trial Do not use cross-domain user

    / item embeddings 2nd Trial Use cross-domain user / item embeddings Notify all JP users of free stickers
  15. Case: Free Stickers Results Impression ×36 Score ×13 CTR +40%

    › Note that a low score brings less impressions because other content is more likely to be chosen by bandit algorithm. = click / mute
  16. Auto Targeting Smart Channel Service A First-stage Recommendation Recommendation for

    User A Service B News articles Sticker Fortune Service C CRS Engine Second-stage Cross-Domain Recommendation targeting scoring filtering Only a subset of items passes User A 35-39 male Feedback Service D Upload Content First-stage recommendation is not mandatory
  17. Data Science efforts Data Science Team 1 Data Science Teams

    Data Science Team 2 Data Science Team 3 Data Science Team 4
  18. Data Analysis Examples › Chat Menu Renewal › Define KPIs

    in the order of priority › Estimate effects of new UI bias › Open Score for OA › Users tend to open messages less when receiving them more › Predicting `open rate’ and control the volume of message delivery
  19. OA Targeting for Fintech Services Improvement with Lookalike Fintech Services

    Text message Rich message Send OA message › Past: Manual targeting › Present: Lookalike targeting Sent March 18, 2020
  20. All Users Lookalike Audience Targeting › Lookalike engine takes a

    seed user set as input and output a set of similar users z-features Similar Users Seed Users Seed Users Similar Users Lookalike Engine
  21. Experiments CTR +164% CVR +159% CTR +117% CVR +53% CTR

    +67% CVR +12% CTR +200% CVR +814% Manual Targeting vs Lookalike Targeting (2019-12 - 2020-02) Note that these campaigns have already ended
  22. Data Management Data Catalog Data Governance › Information security ›

    Data owner approval › Data Open guidance Security › Authentication: LDAP + Kerberos › Authorization: Apache Ranger › Auditing: Apache Ranger + native audit log for each component
  23. Data Governance Communication Data management is a hub for inquiries

    and assists with utilizing data Planner/Engineer Data Management Security Privacy Legal Data Scientist / ML Inquiry
  24. Masala Library for Distributed ML on Kubernetes › ZeroMQ ›

    Fast and stable › asyncio with aiozmq library › Transfer Manager › Manage push/pull sockets lifecycle › MPI › State Synchronization › Distributed Training (e.g. Horovod) Kubernetes mpi run CPU Pod Process Process push push CPU Pod Process Process push push mpi run GPU Pod Process pull Process pull GPU Pod Process pull Process pull Transfer Manager