Hi @drozzy I feel your pain. I came from stable_baselines too. I just wrote a runnable script to try out a trained policy below. It’s for multi-agent but can be easily modified for single agent. I think the docs has an example for single agent, but I couldn’t remember where atm. Cheers,
Edit: there is definitely a higher learning curve for RLlib than stable_baselines, imho. For my research work, I wish I had started with RLlib than stable_baselines.