High-Quality Diversification for Task-Oriented Dialogue Systems

High-Quality Diversification for Task- Oriented Dialogue Systems Grace Hui Yang
Joint work with my students Zhiwen Tang and Hrishikesh Kulkarni March 2, 2023 @ National University of Singapore

Task-Oriented Dialogue Systems • Training the DRL agent with user
simulators ◦ Training with real human user is expensive • Most user simulators are rule-based ◦ Designed by domain experts ◦ Efficient in routine task scenarios • But they are unable to generate spontaneous responses like real humans 2

Increase dialogue diversity • Long-lasting research interest motivated by different
needs: ◦ Avoid dull responses ◦ Obtain robust agents • Ideas to improve diversification: ◦ Enforce diversity in objective functions (Li et al., 2016a; Baheti et al., 2018) ◦ Perturb language rules (Niu and Bansal, 2019) or environment parameters (Tobin et al., 2017; Ruiz et al., 2019) ◦ Randomize trajectory synthesis (Andrychowicz et al., 2017; Lu et al., 2019) ◦ Select more diverse data contributors (Stasaski et al., 2020) ◦ Sample training trajectories from a diverse set of environments (Chua et al., 2018; Janner et al., 2019) 4

Training with Diversified User Models • An ensemble of diversified
user models • One issue is error propagation ◦ Errors propagate from (user) model learning to policy learning 5

We propose to • Control the level of diversification ◦
Control its intensity, frequency, and amount ◦ Reach a balance between exploration/diversity and accuracy • A diversification method for reinforcement learning-supported dialogue agent ◦ Intermittent Short Extension Ensemble (I-SEE) 7

System Architecture 8

Intermittent Short Extension Ensemble (I-SEE) • Generate a base trajectory
by interacting with an expert user simulator 9 Base trajectory 𝛕₁ 𝛕₂ 𝛕₃ 𝛕₄ 𝛕₅ 𝛕₀ • Intermittently branch the interaction trajectory • Extend for a short horizon by interacting with a diversified user model • Both the base trajectories and diversified trajectories are used for training Diversified trajectory

Diversified User Model Ensemble (DUME) Create DUME with imitation learning
(e.g. behavior cloning) 11 Dialogue Agent User Simulator Diversified User Model Ensemble (DUME) User models are neural networks with same architecture but with different initialization parameters.

Control the Quality of Diversification • How big is the
ensemble (DUME) size? 12 Base trajectory 𝛕₁ 𝛕₂ 𝛕₃ 𝛕₄ 𝛕₅ 𝛕₀ Diversified trajectory • How long is the branching horizon? • How much is the branching intensity

Policy Learning with I-SEE • The dialogue agent starts by
interacting with the expert simulator from t=0 • At t=p, the agent switch to interacting with a user model sampled from DUME • We balance between diversity and quality by controlling ◦ The branching intensity ◦ The branching horizon ◦ The ensemble size 13

Experiments • Multiwoz (Budzianowski et al., 2018) ◦ 7 domains:
train, restaurant, taxi, ... ◦ A user goal may involve multiple domains ◦ 8,438 dialogues • Evaluation Metrics ◦ Success rate ◦ Inform F1 ◦ Match ◦ #Turns 16 • Baselines ◦ PPO (Schulman et al., 2017) ◦ DQN (Mnih et al., 2015) ◦ DDQ (DQN + unconstrained diversification) (Peng et al., 2018) ◦ GDPL (PPO + IRL, leading performer on Multiwoz) (Takanobu et al., 2019) ◦ MADPL (MARL, leading performer on Multiwoz) (Takanobu et al., 2020)

Experiment Results 17 Three settings: • X: the algorithm without
diversification • X + Dvs: full and uncontrolled diversification • X+I-SEE: the proposed diversification method

Analysis of I-SEE 18 (a) Ensemble size (E) (b) Diversification
horizon (H) (c) Diversification ratio (𝜂)

Analysis of I-SEE 19 Diversity of DUME

Summary • I-SEE builds a diversified user model ensemble with
imitation learning • It randomizes the initialization parameters of the neural networks • Errors in user model learning may quickly propagate into policy learning • To control the degree of noise: ◦ The agent intermittently interacts with trainable user models ◦ The diversified trajectories are of short horizon 20

Conclusion • Diversification works • However, it is not the
more diverse, the better ◦ Effectiveness peak when we only use intermittent and short sampled, diversified trajectories ◦ Full diversification hurts the performance • Fully-automatic training dialogue agents with simulators can be effective 21

Thank you! Paper: https://arxiv.org/abs/2106.00891 Code: https://github.com/smt-HS/I-SEE Lab website: http://infosense.cs.georgetown.edu/ 22

Additional Slides 23

High-Quality Diversification for Task-Oriented ...

High-Quality Diversification for Task-Oriented Dialogue Systems

wing.nus

More Decks by wing.nus

Featured

Transcript