Upgrade to Pro — share decks privately, control downloads, hide ads and more …

High-Quality Diversification for Task-Oriented ...

wing.nus
March 17, 2023
340

High-Quality Diversification for Task-Oriented Dialogue Systems

Many task-oriented dialogue systems use deep reinforcement learning (DRL) to learn policies that respond to the user appropriately and complete the tasks successfully. Training DRL agents with diverse dialogue trajectories prepare them well for rare user requests and unseen situations. One effective diversification method is to let the agent interact with a diverse set of learned user models. However, trajectories created by these artificial user models may contain generation errors, which can quickly propagate into the agent’s policy. It is thus important to control the quality of the diversification and resist the noise. In this paper, we propose a novel dialogue diversification method for task-oriented dialogue systems trained in simulators. Our method, Intermittent Short Extension Ensemble (I-SEE), constrains the intensity to interact with an ensemble of diverse user models and effectively controls the quality of the diversification. Evaluations show that I-SEE successfully boosts the performance of several DRL dialogue agents.

Speaker's bio: Dr. Grace Hui Yang is an Associate Professor in the Department of Computer Science at Georgetown University. Dr. Yang is leading the InfoSense (Information Retrieval and Sense-Making) group at Georgetown University, Washington D.C. Dr. Yang obtained her Ph.D. from Carnegie Mellon University in 2011. Her current research interests include deep reinforcement learning, interactive agents, and human-centered AI. Prior to this, she conducted research on question answering, automatic ontology construction, near-duplicate detection, multimedia information retrieval, and opinion and sentiment detection. Dr. Yang's research has been supported by the Defense Advanced Research Projects Agency (DARPA) and the National Science Foundation (NSF). Dr. Yang co-organized the Text Retrieval Conference (TREC) Dynamic Domain Track from 2015 to 2017 and led the effort for SIGIR privacy-preserving information retrieval workshops from 2014 to 2016. Dr. Yang has served on the editorial boards of ACM TOIS and Information Retrieval Journal (from 2014 to 2017) and has actively served as an organizing or program committee member in many conferences such as SIGIR, ECIR, ACL, AAAI, ICTIR, CIKM, WSDM, and WWW. She is a recipient of the NSF Faculty Early Career Development Program (CAREER) Award.

wing.nus

March 17, 2023
Tweet

More Decks by wing.nus

Transcript

  1. High-Quality Diversification for Task- Oriented Dialogue Systems Grace Hui Yang

    Joint work with my students Zhiwen Tang and Hrishikesh Kulkarni March 2, 2023 @ National University of Singapore
  2. Task-Oriented Dialogue Systems • Training the DRL agent with user

    simulators ◦ Training with real human user is expensive • Most user simulators are rule-based ◦ Designed by domain experts ◦ Efficient in routine task scenarios • But they are unable to generate spontaneous responses like real humans 2
  3. 3

  4. Increase dialogue diversity • Long-lasting research interest motivated by different

    needs: ◦ Avoid dull responses ◦ Obtain robust agents • Ideas to improve diversification: ◦ Enforce diversity in objective functions (Li et al., 2016a; Baheti et al., 2018) ◦ Perturb language rules (Niu and Bansal, 2019) or environment parameters (Tobin et al., 2017; Ruiz et al., 2019) ◦ Randomize trajectory synthesis (Andrychowicz et al., 2017; Lu et al., 2019) ◦ Select more diverse data contributors (Stasaski et al., 2020) ◦ Sample training trajectories from a diverse set of environments (Chua et al., 2018; Janner et al., 2019) 4
  5. Training with Diversified User Models • An ensemble of diversified

    user models • One issue is error propagation ◦ Errors propagate from (user) model learning to policy learning 5
  6. 6

  7. We propose to • Control the level of diversification ◦

    Control its intensity, frequency, and amount ◦ Reach a balance between exploration/diversity and accuracy • A diversification method for reinforcement learning-supported dialogue agent ◦ Intermittent Short Extension Ensemble (I-SEE) 7
  8. Intermittent Short Extension Ensemble (I-SEE) • Generate a base trajectory

    by interacting with an expert user simulator 9 Base trajectory 𝛕₁ 𝛕₂ 𝛕₃ 𝛕₄ 𝛕₅ 𝛕₀ • Intermittently branch the interaction trajectory • Extend for a short horizon by interacting with a diversified user model • Both the base trajectories and diversified trajectories are used for training Diversified trajectory
  9. 10

  10. Diversified User Model Ensemble (DUME) Create DUME with imitation learning

    (e.g. behavior cloning) 11 Dialogue Agent User Simulator Diversified User Model Ensemble (DUME) User models are neural networks with same architecture but with different initialization parameters.
  11. Control the Quality of Diversification • How big is the

    ensemble (DUME) size? 12 Base trajectory 𝛕₁ 𝛕₂ 𝛕₃ 𝛕₄ 𝛕₅ 𝛕₀ Diversified trajectory • How long is the branching horizon? • How much is the branching intensity
  12. Policy Learning with I-SEE • The dialogue agent starts by

    interacting with the expert simulator from t=0 • At t=p, the agent switch to interacting with a user model sampled from DUME • We balance between diversity and quality by controlling ◦ The branching intensity ◦ The branching horizon ◦ The ensemble size 13
  13. 14

  14. 15

  15. Experiments • Multiwoz (Budzianowski et al., 2018) ◦ 7 domains:

    train, restaurant, taxi, ... ◦ A user goal may involve multiple domains ◦ 8,438 dialogues • Evaluation Metrics ◦ Success rate ◦ Inform F1 ◦ Match ◦ #Turns 16 • Baselines ◦ PPO (Schulman et al., 2017) ◦ DQN (Mnih et al., 2015) ◦ DDQ (DQN + unconstrained diversification) (Peng et al., 2018) ◦ GDPL (PPO + IRL, leading performer on Multiwoz) (Takanobu et al., 2019) ◦ MADPL (MARL, leading performer on Multiwoz) (Takanobu et al., 2020)
  16. Experiment Results 17 Three settings: • X: the algorithm without

    diversification • X + Dvs: full and uncontrolled diversification • X+I-SEE: the proposed diversification method
  17. Analysis of I-SEE 18 (a) Ensemble size (E) (b) Diversification

    horizon (H) (c) Diversification ratio (𝜂)
  18. Summary • I-SEE builds a diversified user model ensemble with

    imitation learning • It randomizes the initialization parameters of the neural networks • Errors in user model learning may quickly propagate into policy learning • To control the degree of noise: ◦ The agent intermittently interacts with trainable user models ◦ The diversified trajectories are of short horizon 20
  19. Conclusion • Diversification works • However, it is not the

    more diverse, the better ◦ Effectiveness peak when we only use intermittent and short sampled, diversified trajectories ◦ Full diversification hurts the performance • Fully-automatic training dialogue agents with simulators can be effective 21