Recent Advances in Candidate Matching

Recent Advances on Candidate Matching Chenliang LI Wuhan University 2023.03.02

Introduction Part 1 2 We are now living in an
age of INFORMATION EXPLOSION Search Recommendation Chatbot Information Seeking happens everyday for everyone in almost everywhere

Introduction Part 1 3 Candidate Matching Ranking Phase 1 Phase
2 Hundreds Tens Models: Matching Models Ranking Models All Items Millions User History and Contexts All other Side Info Data Sources: The two-stage pipeline is widely deployed in real-world systems Ranking Models Matching Models Candidate Matching / Retrieval / Generation are equivalent to each other

2 Hundreds Tens Models: Matching Models Ranking Models All Items Millions User History and Contexts All other Side Info Data Sources: The two-stage pipeline is widely deployed in real-world systems Ranking Models Matching Models LOW LATENCY Query Encoder Item Encoder starbucks drinks S

2 Hundreds Tens Models: Matching Models Ranking Models All Items Millions User History and Contexts All other Side Info Data Sources: The two-stage pipeline is widely deployed in real-world systems Ranking Models Matching Models LOW LATENCY Query Encoder Item Encoder starbucks drinks S User History and Contexts All other Side Info LOW LATENCY How to Use Context and Side Info

Introduction Part 1 6 The HISTORY for candidate matching Item-based
CF Representation Learning Interaction-based Learning Before 2013 Now Pearson-based CF SLIM (i.e., MF) DSSM, YoutubeDNN, BST, DPR, ESAM, Condenser, MADR, MIND, KEMI UMI, PDN, AGREE

Introduction Part 1 7 The HISTORY for candidate matching Item-based
CF Pearson-based CF SLIM (i.e., MF) A B C High Correlation Item-based filtering (Amazon, 2001) Pearson Coefficient Latent Vector

Introduction Part 1 8 The HISTORY for candidate matching Representation
Learning DNN/Transformer DSSM, YoutubeDNN, BST, DPR, ESAM, Condenser, MADR, MIND, KEMI Multi-Interest Learning

Introduction Part 1 9 The HISTORY for candidate matching Interaction-based
Learning UMI, PDN, AGREE How to enable efficient interaction-based feature learning?

Overview - Representation Learning – Deep DNN Part 2 10
p Computation parallelization and speedup are must-be for dense retrieval Deep Neural Networks for YouTube Recommendations, RecSys 2016 Learning Deep Structured Semantic Models for Web Search using Clickthrough Data, CIKM 2013

Overview - Representation Learning - Multi-Interest Learning Part 2 11
p To model multi-aspect / -interest nature of the world Controllable Multi-Interest Framework for Recommendation, KDD 2020 Multi-Aspect Dense Retrieval, KDD 2022

Overview - Representation Learning - GNN Part 2 12 p
To exploit high-order semantics and correlations Neural Graph Collaborative Filtering, SIGIR 2019 LightGCN: Simplifying and Powering Graph Convolution Network for Recommendation, SIGIR 2020

Overview - Representation Learning – GNN + Multi-Interest Part 2
13 1 • Refine user preferences based on multi-level correlations between historical items. Graph convolution Aggregation 2 • Focus on extracting different interests by performing historical item clustering. Multi-interest learning p To exploit the both benefits of GNN and multi-interest learning When Multi-Level Meets Multi-Interest: A Multi-Grained Neural Model for Sequential Recommendation, SIGIR 2022

Overview - Representation Learning – GNN + Multi-Interest Part 2
14 When Multi-Level Meets Multi-Interest: A Multi-Grained Neural Model for Sequential Recommendation, SIGIR 2022

Overview - Representation Learning – Long-tail Problem Part 2 15
p Non-displayed items cause exposure bias • Trained only with displayed items • Retrieve items in the entire space Exposure Bias Displayed Non-displayed Label ESAM: Discriminative Domain Adaptation with Non-Displayed Items to Improve Long-Tail Performance, SIGIR 2021

p Why poor long-tail performance • Domain Shift • Representation learning is not robust and consistent Cause How to solve • Unsupervised Domain Adaptation (reduce domain shift) ESAM: Discriminative Domain Adaptation with Non-Displayed Items to Improve Long-Tail Performance, SIGIR 2021

p Ranking model backbone • Unsupervised domain adaptation to reduce domain shift p Formula definition ESAM ESAM n Motivation • Architectural solutions may not generalize well. • Highlight the importance of learning good feature representations for non-displayed items. Non-displayed Items Displayed Items ESAM: Discriminative Domain Adaptation with Non-Displayed Items to Improve Long-Tail Performance, SIGIR 2021

p Domain Shift • Attribute correlation alignment 𝑳𝑫𝑨 Displayed Non-displayed Label Source Domain Target Domain • High-Level Attribute Distribution definition • Distribution verification ESAM: Discriminative Domain Adaptation with Non-Displayed Items to Improve Long-Tail Performance, SIGIR 2021

p Center-wise clustering for source domain 𝑳𝑫𝑪 𝒄 • Attribute correlation alignment • 𝑳𝑫𝑪 𝒄 makes similar items cohere together while dissimilar items separate from each other Items with the same feedback (click) are similar Item Click? ✔ × ✔ × × × ✔ ESAM: Discriminative Domain Adaptation with Non-Displayed Items to Improve Long-Tail Performance, SIGIR 2021

p Self-training for target clustering 𝑳𝑫𝑪 𝒑 • To suppress negative transfer & easy-to-hard strategy • Why: ignoring target label information when aligning • Entropy regularization Negative Transfer ESAM: Discriminative Domain Adaptation with Non-Displayed Items to Improve Long-Tail Performance, SIGIR 2021

Overview – Representation Learning - Auxiliary Knowledge Part 2 21
Knowledge-aware Attention for Information Propagation p Linking items with their attributes KGAT: Knowledge Graph Attention Network for Recommendation, KDD 2019

Overview - Representation Learning – Auxiliary Knowledge - MMoE Part
2 22 p Sharing App in Taobao • UV ~ 19M, GMV ~ 400M p Target • More sharing between users in the platform • More interactions and more friends p Features • Many different sharing scenarios • Correlations and discrepancies Questions: 1. How to exploit scenario dependent knowledge ? 2. How to handle the low-resource nature for long-tail scenarios ? 3. How to accommodate with social relations between users ? Heterogeneous Graph Augmented Multi-Scenario Sharing Recommendation with Tree-Guided Expert Networks, WSDM 2021

2 23 p TreeMMoE • We can build a tree structure to model the hierarchical relations between different scenarios “C2C→Sharing→Entity→Product→Makeups” • Some fine-grained scenarios would hold some common ancestor scenarios • Knowledge can be transferred Heterogeneous Graph Augmented Multi-Scenario Sharing Recommendation with Tree-Guided Expert Networks, WSDM 2021

2 24 p TreeMMoE Each layer of the tree has a gate for the corresponding expert network Heterogeneous Graph Augmented Multi-Scenario Sharing Recommendation with Tree-Guided Expert Networks, WSDM 2021

2 25 p Sum up for representation learning. Two-Tower DNN Multi-Interest Learning Long-tail Problem Auxiliary Knowledge Representation Learning

Overview - Interaction-based Learning – Attention Mechanism Part 2 26
p Attention Mechanism (Target Attention) for relevant feature highlighting Deep Interest Network for Click-Through Rate Prediction, KDD 2018

Overview - Interaction-based Learning – User Side Part 2 27
p Identify important user features / behaviors for better representation User-Aware Multi-Interest Learning for Candidate Matching in Recommenders, SIGIR 2022

Overview - Interaction-based Learning – Item Side Part 2 28
p Identify important item features / behaviors for precise item-item relevance Path-based Deep Network for Candidate Item Matching in Recommenders, SIGIR 2021

Overview - Interaction-based Learning – Optimization Part 2 29 p
Enable target attention in an efficient way Fast Semantic Matching via Flexible Contextualized Interaction, WSDM 2022

2 30 p Sum up for interaction-based learning. User Side Item Side Reduce Computation Cost Interaction-Based Learning

Future Trends – One Model Serves ALL Part 3 31
Automatically Network Sharing and Optimization Category Search Insurance Search Hot Trends Item Search Live Commerce 500万Content Matrices 直播/视频 Complicated Entity Relations X00+ Insurances 8k+Fund 80K+Stocks 美股/港股/A股 4k+Agents 100+ Financial Organ. XXX Sections Category Search Section/Con cept Search Fund Search (Price) Hot Trends QA Diverse Queries & Intents 白酒/军工/材料富/广发/鹏华人寿险/车险、财产险/健康险新发/热门/金选 ALiPay APP Automatic Expert Selection for Multi-Scenario and Multi-Task Search, SIGIR 2022

One model serves ALL – Automatically Network Sharing and Optimization Category Search Insurance Search Hot Trends Item Search Live Commerce 500万Content Matrices 直播/视频 Complicated Entity Relations X00+ Insurances 8k+Fund 80K+Stocks 美股/港股/A股 4k+Agents 100+ Financial Organ. XXX Sections Category Search Section/Con cept Search Fund Search (Price) Hot Trends QA Diverse Queries & Intents 白酒/军工/材料富/广发/鹏华人寿险/车险、财产险/健康险新发/热门/金选 ALiPay APP Automatic Expert Selection for Multi-Scenario and Multi-Task Search, SIGIR 2022

One model serves ALL – Automatically Network Sharing and Optimization • Personalization for each instance • Modeling complex scenarios & tasks • Flexible & Scalable • End-to-End、Low Cost Automatic Expert Selection for Multi-Scenario and Multi-Task Search, SIGIR 2022

Future Trends – Go Beyond Two-Tower Part 3 34 REAL
Interaction-based Learning – Coupling and Decoupling Expensive but Effective starbucks drinks S … … Beyond Two-Tower: Attribute Guided Representation Learning for Candidate Retrieval, WWW 2023

Future Trends Part 3 35 REAL Interaction-based Learning – Coupling
and Decoupling Expensive but Effective starbucks drinks S … … Can we transfer the capacity of interaction-based learning for inference phase??? Beyond Two-Tower: Attribute Guided Representation Learning for Candidate Retrieval, WWW 2023

and Decoupling Coupling for Training starbucks drinks S … … starbucks drinks S … … Cheap yet Effective Decoupling for Inference Beyond Two-Tower: Attribute Guided Representation Learning for Candidate Retrieval, WWW 2023

and Decoupling Coupling for Training starbucks drinks S … … starbucks drinks S … … Cheap yet Effective Decoupling for Inference How to design an appropriate coupling mechanism to support effective representation learning and easy decoupling afterwards for inference phase? Beyond Two-Tower: Attribute Guided Representation Learning for Candidate Retrieval, WWW 2023

and Decoupling Attribute Fusion Layer Attribute-Aware Learning Beyond Two-Tower: Attribute Guided Representation Learning for Candidate Retrieval, WWW 2023

and Decoupling The most relevant attributes are more diverse for AGREE than a vanilla two-tower solution Without Attribute-Aware Learning Beyond Two-Tower: Attribute Guided Representation Learning for Candidate Retrieval, WWW 2023

and Decoupling The most relevant attributes are more diverse for AGREE than a vanilla two-tower solution Without Attribute-Aware Learning Beyond Two-Tower: Attribute Guided Representation Learning for Candidate Retrieval, WWW 2023 More sophisticated coupling and decoupling mechanisms deserve investigation.

We need YOU Part 4 41 The Most Beautiful Campus
in China

We need YOU Part 4 42 The Most Beautiful Campus
in China • 国家优秀青年科学基⾦项⽬（海外）岗位 • 固定教职教授 • 固定教职副教授 • 特聘研究员 • 特聘副研究员 Faculty positions in every level are available!

Let us QA! [email protected] http://lichenliang.net/ 43 The End p A
SIMPLE overview towards the current progress on candidate matching p LOW LATENCY is a MUST BE (We need ARTs for both effectiveness and efficiency) • Our human beings are GREEDY p Some insights towards the future trends p Hope some of you can join us!

Recent Advances in Candidate Matching

Recent Advances in Candidate Matching

More Decks by wing.nus

Featured

Transcript