(disclaimer: I just asked the same question on SO python - Asymetrical Adversarial Reinforcement Learning with Ray Tune - Stack Overflow, I didn’t realize there was a Ray Forum)
I am training two agents A and B to play an asymmetrical game, using a Gym environment (with multi-agent policies) and the Ray Tune library.
The asymmetrical nature of the game means that B has a much easier role than A. As a result, B trains much faster and becomes much stronger. This leads to B being too strong for A to make any meaningful progress in its learning, since it always loses.
The solution I envision is, as in GANs, to train A when B wins >50% of the time, and B when A wins >50% of the time. However, I did not find any way to implement this with Tune.
How may I favor the training of the losing agent rather than have all agents train equally ?
Thank you !