[RLlib] Make it easier to play trained policies

Hi @drozzy I feel your pain. I came from stable_baselines too. I just wrote a runnable script to try out a trained policy below. It’s for multi-agent but can be easily modified for single agent. I think the docs has an example for single agent, but I couldn’t remember where atm. Cheers,

Edit: there is definitely a higher learning curve for RLlib than stable_baselines, imho. For my research work, I wish I had started with RLlib than stable_baselines.

Edit 2: the single agent version is here: Getting Started with RLlib — Ray 3.0.0.dev0