Asymetrical Adversarial Reinforcement Learning with Ray Tune

(disclaimer: I just asked the same question on SO python - Asymetrical Adversarial Reinforcement Learning with Ray Tune - Stack Overflow, I didn’t realize there was a Ray Forum)

I am training two agents A and B to play an asymmetrical game, using a Gym environment (with multi-agent policies) and the Ray Tune library.

The asymmetrical nature of the game means that B has a much easier role than A. As a result, B trains much faster and becomes much stronger. This leads to B being too strong for A to make any meaningful progress in its learning, since it always loses.

The solution I envision is, as in GANs, to train A when B wins >50% of the time, and B when A wins >50% of the time. However, I did not find any way to implement this with Tune.

How may I favor the training of the losing agent rather than have all agents train equally ?

Thank you !

Not sure exactly what you mean by “the asymmetrical nature of the game”, but here are a couple of potential solutions:

  1. You could modify your reward schema so that A receives a much bigger reward than B does when it wins. This may allow A’s policy to prioritize moving in that direction a bit faster than it currently does. Alternatively, don’t reward B as much.
  2. You could try tuning the policy and algorithm parameters for A and B so that the learning is more balanced.
  3. You could freeze B’s policy periodically so that you give A a chance to “catch up”.