RLLib: How to use policy learned in tune.run()?

If I am executing one-time RL training using

results = tune.run(
“PPO”,
config={
“env”: SimpleCorridor,
“num_workers”: 4,
“env_config”: {“corridor_length”: 5}})

what is the command to use the learned policy?

source : https://docs.ray.io/en/master/ray-overview/index.html#gentle-intro

1 Like

There is a way to do so via command line, but I think you are asking a way by code. Below is my example. There might be a better way. But I save the trained model to a checkpoint, then read it back and evaluate it by running the environment against it.

2 Likes

Thanks, @RickLan for sharing example

Hey @RickLan . The policy I got from this approach is stochastic. I have raised another issue here for making learned policy deterministic. https://discuss.ray.io/t/getting-deterministic-policy-after-dqn-training/2237. would you like to share any insights on that?

Hey @Saurabh_Arora , just set the config key:

"explore": False

in the recovered policy.
You can also do this on a per-call basis. Every method: compute_action(s) of the Trainer or Policy have a explore flag that you can set to True, False or None (use the default defined in the config).

@sven1977 , trainer class had following option

@PublicAPI
def compute_action(self,
observation: TensorStructType,
state: List[TensorStructType] = None,
prev_action: TensorStructType = None,
prev_reward: float = None,
info: EnvInfoDict = None,
policy_id: PolicyID = DEFAULT_POLICY_ID,
full_fetch: bool = False,
explore: bool = None) → TensorStructType:

so I used explore flag. Please correct if it does not achieve the same affect.