drozzy
April 1, 2021, 4:18am
1
I’m surprised no one else complains about this… but it is near impossible to try out a trained policy (or at least I can’t find ANY docs about this).
Basically training is easy, but stepping through the environment and feeding it an action selected by policy is not documented anywhere:
Here is an issue I filed about this:
opened 12:24AM - 02 Mar 21 UTC
closed 02:36PM - 19 May 21 UTC
enhancement
rllib
I know this might be a duplicate, but there is still no clear section in the doc… s explaining how to do a simple rollout/render from a trained policy.
What I'm talking about is **code-based** (not command line based) equivalent of [stable baselines example like this](https://stable-baselines.readthedocs.io/en/master/guide/examples.html#basic-usage-training-saving-loading):
<img width="647" alt="Screen Shot 2021-03-01 at 7 20 19 PM" src="https://user-images.githubusercontent.com/140710/109577178-2ac99f00-7ac3-11eb-8408-a878ec8a1143.png">
In constrast, rllib provides only [this huge file that is really hard to understand](https://github.com/ray-project/ray/blob/master/rllib/rollout.py#L25), and can only be used via cli:
<img width="549" alt="Screen Shot 2021-03-01 at 7 21 52 PM" src="https://user-images.githubusercontent.com/140710/109577305-62d0e200-7ac3-11eb-9ca6-c078c2eae2d4.png">
It would be nice to have an example of how to access a trained policy itself, so that we can write a simple render loop like in stable-baselines above.
Hi @drozzy I feel your pain. I came from stable_baselines too. I just wrote a runnable script to try out a trained policy below. It’s for multi-agent but can be easily modified for single agent. I think the docs has an example for single agent, but I couldn’t remember where atm. Cheers,
import ray
import ray.rllib.agents.ppo as ppo
from ray.tune.logger import pretty_print
from ray.rllib.examples.env.random_env import RandomMultiAgentEnv
num_agents = 2
config = ppo.DEFAULT_CONFIG.copy()
config["num_workers"] = 1
config["env_config"] = {
"num_agents" : num_agents,
}
env = RandomMultiAgentEnv(config["env_config"])
config["multiagent"] = {
"policies" : { # (policy_cls, obs_space, act_space, config)
"{}".format(x): (None, env.observation_space, env.action_space, {}) for …
Edit: there is definitely a higher learning curve for RLlib than stable_baselines, imho. For my research work, I wish I had started with RLlib than stable_baselines.
Edit 2: the single agent version is here: Getting Started with RLlib — Ray 3.0.0.dev0
Hey @drozzy , great point, and thanks for all your help on this @stefanbschneider and @RickLan .
Yes, we should document this better.
For the LSTM and attention cases, you can also take a look at these example script, where these env loops are described in the comments:
ray.rllib.examples.attention_net.py
and ray.rllib.examples.cartpole_lstm.py
.