[RLlib] Make it easier to play trained policies

drozzy · April 1, 2021, 4:18am

I’m surprised no one else complains about this… but it is near impossible to try out a trained policy (or at least I can’t find ANY docs about this).

Basically training is easy, but stepping through the environment and feeding it an action selected by policy is not documented anywhere:

Here is an issue I filed about this:

github.com/ray-project/ray

Load, play, rollout a trained/saved policy (without cli rollout.py)

opened 12:24AM - 02 Mar 21 UTC

closed 02:36PM - 19 May 21 UTC

drozzy

enhancement rllib

I know this might be a duplicate, but there is still no clear section in the doc…s explaining how to do a simple rollout/render from a trained policy. What I'm talking about is **code-based** (not command line based) equivalent of [stable baselines example like this](https://stable-baselines.readthedocs.io/en/master/guide/examples.html#basic-usage-training-saving-loading): <img width="647" alt="Screen Shot 2021-03-01 at 7 20 19 PM" src="https://user-images.githubusercontent.com/140710/109577178-2ac99f00-7ac3-11eb-8408-a878ec8a1143.png"> In constrast, rllib provides only [this huge file that is really hard to understand](https://github.com/ray-project/ray/blob/master/rllib/rollout.py#L25), and can only be used via cli: <img width="549" alt="Screen Shot 2021-03-01 at 7 21 52 PM" src="https://user-images.githubusercontent.com/140710/109577305-62d0e200-7ac3-11eb-9ca6-c078c2eae2d4.png"> It would be nice to have an example of how to access a trained policy itself, so that we can write a simple render loop like in stable-baselines above.

RickLan · April 1, 2021, 5:24am

Hi @drozzy I feel your pain. I came from stable_baselines too. I just wrote a runnable script to try out a trained policy below. It’s for multi-agent but can be easily modified for single agent. I think the docs has an example for single agent, but I couldn’t remember where atm. Cheers,

Edit: there is definitely a higher learning curve for RLlib than stable_baselines, imho. For my research work, I wish I had started with RLlib than stable_baselines.

Edit 2: the single agent version is here: Getting Started with RLlib — Ray 3.0.0.dev0

sven1977 · June 3, 2021, 3:29pm

Hey @drozzy , great point, and thanks for all your help on this @stefanbschneider and @RickLan .
Yes, we should document this better.

For the LSTM and attention cases, you can also take a look at these example script, where these env loops are described in the comments:

ray.rllib.examples.attention_net.py and ray.rllib.examples.cartpole_lstm.py.

Topic		Replies	Views
RLLib: How to use policy learned in tune.run()? RLlib	6	994	September 21, 2023
How to get and use a trained policy RLlib	0	486	September 8, 2024
Rollout/test a already trained policy employing PolicyServerInput and PolicyClient RLlib	1	299	October 30, 2021
How to deploy a trained Ray RLlib PPO policy/model in multi-agent-case? RLlib	5	831	November 10, 2021
How to compute actions with RLlib and Tune after training RLlib	6	517	July 15, 2025

[RLlib] Make it easier to play trained policies

Related topics