Upgrade to Pro — share decks privately, control downloads, hide ads and more …

[Master Thesis] Diversity and Novelty as Object...

[Master Thesis] Diversity and Novelty as Objectives in Poker

Jessica Pauli de C Bonson

August 09, 2016
Tweet

More Decks by Jessica Pauli de C Bonson

Other Decks in Technology

Transcript

  1. Diversity and Novelty as Objectives in Poker Jéssica Pauli de

    C. Bonson Supervisor: Dr. Malcolm I. Heywood Co-Supervisor: Dr. Andrew R. McIntyre
  2. Motivation • Evolutionary algorithms can lead to efficient solutions without

    a predefined design • Downsides: ◦ prone to early convergence ◦ may be deceived by a non-informative or deceptive fitness function 2
  3. Methodology • Evolve agents with various combinations of diversity maintenance

    methods and fitness-based evolution • Compare them in ten scenarios for diverse types of hands and opponents • Analyze effects on diversity, performance and behaviors 6
  4. Methodology: Fitness Function • The performance of a player is

    measured by the average chips won per hand 8
  5. Methodology: Diversity • Diversity maintenance methods: ◦ bid diversity ◦

    genotypic diversity ◦ behavioral diversity ◦ novelty search 10
  6. Experiments • Opponent Complexity Group • Degrees of Diversity Group

    • Diversity Models Group • Analysis of the behaviors 12
  7. Results: Diversity Models Group • Comparisons using the cumulative plots

    ◦ Diversity and Performance ◦ 9 models in 10 scenarios ◦ Tests: Friedman, Bonferroni-Dunn, and Nemenyi 16
  8. Results: Diversity Models Group • Most models were able to

    improve the diversity of the agents • Two models translated diversity into performance 17
  9. Results: Diversity Models Group • The results indicate that novelty

    search alone does not work well for Texas Hold'em Poker • The model with novelty search and fitness was significantly better than the one without fitness 18
  10. Results: Analysis of Behaviors • Formulated the hypothesis that novelty

    would incentive bluffing • The model with only novelty search bluffed as much as 3 models, and significantly more than 5 models 19
  11. Conclusions • Diversity maintenance methods were able to improve diversity

    and performance • Novelty search alone was not enough to improve neither diversity nor performance • Diversity was useful mainly to increase the exploitation of chips per hand 22
  12. Future Work • Find a way to deploy a subset

    of the agents • Further test diversity and novelty on a more ambiguous and complex version of Poker 23
  13. Motivation • How to deal with deceptive tasks? ◦ Diversity

    maintenance ▪ Genotypic diversity ▪ Behavioral diversity ▪ Novelty search 26
  14. Background: Inputs • Game State Inputs ◦ Hand Strength, Effective

    Potential, Pot Odds, Betting Position, Round • Opponent Model Inputs ◦ Last Action, Overall Long-term Aggressiveness, Overall Short-term Aggressiveness, Hand Aggressiveness, Tight/Loose, Passive/Aggressive, Bluffing, Chips, Self Overall Short-term Aggressiveness 27
  15. Background: Hands • Each point corresponds to a poker hand.

    • Training points are balanced in nine categories, per hand strength. • Real-world hands: 60% weak, 30% intermediate, 10% strong. 28
  16. Differences from Previous Work • The main differences between the

    work developed by Alberta's group and this research ◦ evolve a diverse group of capable agents ◦ agents evolve their strategies from scratch ◦ agents work as teams of programs ◦ it is not possible to use simulations ◦ use poker as a domain, not as the goal 29
  17. • Training chart? Too noisy • Tested before in other

    tasks • Why not tournament? To focus on diversity • Normalized between 0.0 and 10.0 ◦ Better for SBB due to previous work results 32 Possible Questions
  18. Bluffing • Behavior ◦ teams play less hands to avoid

    losing chips due to weaker hands ◦ they also increase their bluffing, to exploit the opponent's weaker hands • The teams are using their opponent modeling inputs to find when the opponent seems to have weaker hands, and then bluff 33