Upgrade to Pro — share decks privately, control downloads, hide ads and more …

[Dalhousie University, 2016] Symbiotic Bid-base...

[Dalhousie University, 2016] Symbiotic Bid-based (SBB) GP

Jessica Pauli de C Bonson

March 15, 2016
Tweet

More Decks by Jessica Pauli de C Bonson

Transcript

  1. SBB: General Idea • teams of programs evolve and compete

    over time • each team represents a player in the game • at each generation they play many matches against various opponents • the best ones reproduce and keep evolving, the others are discarded
  2. • each team is composed of 2 or more programs

    • programs are composed of a set of instructions, registers and an action • before a team executes an action: ◦ all of its programs run over the inputs for the current match state ◦ the action from the program with the highest output is selected as the team’s action How the teams work?
  3. Example: TicTacToe • inputs: 9 • actions: 9 • opponents:

    Random and Smart • points: tictactoe matches’ states + opponents
  4. Example: Poker • inputs: 14 (hand strength, position, pot odds,

    opponent model...) • actions: 3 • opponents: ◦ dummy: AlwaysRaise, AlwaysCall, AlwaysFold, Random ◦ static: LooseAggressive, LoosePassive, TightAggressive, TightPassive ◦ dynamic: Bayesian • points: poker matches’ states + opponents
  5. Code • SBB for reinforcement learning: ◦ https://github.com/jpbonson/SBBReinforcementLe arner •

    SBB for classification: ◦ https://github.com/jpbonson/SBBClassifier • Warning! The code is under development!
  6. Fitness Function • For RL, usually it is: ◦ Win:

    1.0 ◦ Draw: 0.5 ◦ Lose: 0.0 • But depending on the game, other functions are able to give more information ◦ eg.: chips won/lose (poker)
  7. Inputs • Very important: They define what the teams can

    see about the environment. • Can be used to control for a weak or strong AI. • Tweak: SBB deals better with inputs normalized between 0.0-10.0 instead of 0.0-1.0.
  8. Opponents • Too easy opponents: ◦ Teams learn to beat

    them quickly and stop evolving • Too strong opponents: ◦ Teams aren’t able to learn to beat them, and SBB turns into a random walk • It is important to balance • Hall of Fame can be used to avoid evolutionary forgetting, but don’t overuse.
  9. Points • How many? ◦ The more, the better ◦

    But it is time consuming • It is important to invest time on optimizing the matches • Ensure all teams see the exact same points • For some games the points can’t be just like the real world task
  10. Diversity • Important to avoid overfitting, so teams can continue

    to evolve new behaviors • Two methods: ◦ Point Profiles ◦ Pareto Dominance + Diversity metrics
  11. Diversity: Point Profiles • Profile: the results of a team

    run against a set of sample inputs, representing states of the game • The new teams are mutated until their profile is different from their parent and/or all the other teams • The set of sample inputs can be made manually or during the training
  12. Diversity: Pareto + Diversity • Pareto Dominance: ◦ A method

    to choose teams so that the chosen ones are the best regarding both the fitness function and the diversity metric
  13. Diversity: Diversity Metrics • There are a lot of diversity

    metrics available, both general metrics and domain-specific metrics • Two types: ◦ Genotype ◦ Phenotype ▪ distance measures (euclidean, hamming...) ▪ entropy ▪ NCD
  14. Second Layer • Uses the teams trained in the first

    layer as actions for the teams in the second layer • Goal: More complex behavior using specialized actions ◦ Eg.: instead of call/raise/fold for poker, it could be passive/aggressive behavior
  15. Mutation Rates • Seven types of mutation rate ◦ Remove

    program from team, swap instructions in programs, etc… • Both passive and aggressive mutations work • It is an exploitation/exploration trade-off
  16. Registers • Varies according to the game • For TicTacToe,

    using 5 instead of 2 improved the score by around 20%. • It is important to reset the registers between matches, but during a match they can be used as memory
  17. Team Size + Program Size • Trade-off between complexity and

    runtime • Varies according to the game ◦ Team size: 2-9 for TTT, 2-16 for poker, +-30 for soccer ◦ Program size: 2-20 for TTT, 5-40 for poker • Should be big enough to deal with the complexity of the game and have space for introns
  18. Operators • Simple set: +, -, /, * • Complex

    set: ln, exp, cos, sin • Ifs set: >=, < • Trade-off between complexity and runtime • Solution for overflow: Rollback so the target register isn’t modified by the instruction
  19. Be able to Reuse Teams • It is important so

    you are able to: ◦ run them against various test cases after training ◦ integrate a trained team as an AI in a system ◦ use them as actions in a second layer • Solution: Save teams as .json files
  20. Metrics • Most of the time runs will take a

    long time to finish, so it is important to think about what metrics are necessary and code them beforehand • Automated metrics save a lot of time • Be careful with bugs
  21. Tests + Inheritance • pSBB for Reinforcement Learning has around

    300k lines of code and around 40 classes • Automated tests and inheritance are essential.