Board game self-play PPO

Rory · March 28, 2021, 5:25pm

Hi There,

I’m doing a similar thing - turn based card game with MultiAgentEnv and self play with PPO. I’d be interested in comparing your approach to self play. Mine is based on what is described here: How to Implement Self Play with PPO? [rllib] · Issue #6669 · ray-project/ray · GitHub.

I have positive results vs. previous attempts. After several hundred training iterations my agent is able to beat a simple rule based agent I’ve written 40% of the time, something previous agents haven’t even been close to.

I’m running into the same problem as you with updating weights. I’m training policy 1. Policies 2-4 are supposed to contain old versions of policy 1. I shift the weights each time that policy 1 achieves >55% win rate over the course of one training iteration. But as you can see the average policy rewards from each of the other three policies is significantly lower than I’d expect - surely they should be roughly equal to policy 1’s average reward. Per episode win rate is around 80% as well, I’d expect it to be around 50% since the other 3 policies are mean to be similar in skill to the trained policy. These are the sorts of results I’d expect of a trained agent verse a random action agent.

Interestingly if I restart training with a saved checkpoint (which I have to do due to a memory issue: PPO trainer eating up memory) the weights seem to propagate properly and the win rate is around 50/60%

Topic		Replies	Views
Rllib multi agent connect 4 issues - why does it 'forget' what it learnt? RLlib	0	244	November 27, 2023
Self-play modifications via callbacks RLlib	4	503	February 24, 2023
Tips for tuning in a competitive multi-agent turn based environment RLlib	2	786	April 9, 2021
Not Sure Which RLlib Algorithm To Use RLlib	5	640	April 27, 2021
RLlib self play with league example stops learning after first generation RLlib	2	215	February 11, 2024

Board game self-play PPO

Related topics