Using multi-agent model after training

I designed a simple custom environment for the kids game Block, Load and Fire. It works like this:

  • Every turn, the two players say in loud voice, at the same time “block”, “load”, or “fire”.
  • Initially, both players have no ammo. Saying “load” will give you one bullet (so this is the obvious choice at the first turn).
  • After this, one can say “shoot”, and if the other player is not blocking he will lose. If both players shoot at the same time, it is a draw. You cannot shoot without bullets.
  • The fifth bullet you reload makes you have a cannon, which means that if you shoot the other player will lose even if he is blocking. If both players shoot a cannon against the other, it is a draw.

The MultiAgentEnv definition as well as the training/test are in this gist.

My problem is that, after training the algorithm, I don’t see a way to use the model as something standalone. How would you complete the test method to play against the trained model?