over time • each team represents a player in the game • at each generation they play many matches against various opponents • the best ones reproduce and keep evolving, the others are discarded
• programs are composed of a set of instructions, registers and an action • before a team executes an action: ◦ all of its programs run over the inputs for the current match state ◦ the action from the program with the highest output is selected as the team’s action How the teams work?
see about the environment. • Can be used to control for a weak or strong AI. • Tweak: SBB deals better with inputs normalized between 0.0-10.0 instead of 0.0-1.0.
them quickly and stop evolving • Too strong opponents: ◦ Teams aren’t able to learn to beat them, and SBB turns into a random walk • It is important to balance • Hall of Fame can be used to avoid evolutionary forgetting, but don’t overuse.
But it is time consuming • It is important to invest time on optimizing the matches • Ensure all teams see the exact same points • For some games the points can’t be just like the real world task
run against a set of sample inputs, representing states of the game • The new teams are mutated until their profile is different from their parent and/or all the other teams • The set of sample inputs can be made manually or during the training
layer as actions for the teams in the second layer • Goal: More complex behavior using specialized actions ◦ Eg.: instead of call/raise/fold for poker, it could be passive/aggressive behavior
using 5 instead of 2 improved the score by around 20%. • It is important to reset the registers between matches, but during a match they can be used as memory
runtime • Varies according to the game ◦ Team size: 2-9 for TTT, 2-16 for poker, +-30 for soccer ◦ Program size: 2-20 for TTT, 5-40 for poker • Should be big enough to deal with the complexity of the game and have space for introns
set: ln, exp, cos, sin • Ifs set: >=, < • Trade-off between complexity and runtime • Solution for overflow: Rollback so the target register isn’t modified by the instruction
you are able to: ◦ run them against various test cases after training ◦ integrate a trained team as an AI in a system ◦ use them as actions in a second layer • Solution: Save teams as .json files
long time to finish, so it is important to think about what metrics are necessary and code them beforehand • Automated metrics save a lot of time • Be careful with bugs