Extract and display policy

Is there an easy way to extract the trained policy within ray.tune at each episode and display that policy applied to a grid (e.g. the optimal action for different values of states)? E.g. to do this within the render function.

Similar question on extracting history of actions/policies over an episode.

You could use custom callbacks to get the policy after each episode. Take a look at this example script here, which implements a on_episode_end() callback.

ray/rllib/examples/custom_metrics_and_callbacks.py.

You can get to the policy object in that method by doing policy = worker.policy_map["default_policy"]. Then you could evaluate it right there on some task?

2 Likes

Hi Sven,
So for some reason using episode.last_observation_for(AGENT_ID) inside on_episode_step() method for the custom callback class doesn’t work. It only ever returns the resetted observations (e.g. the initial observations each episode) and not the stepped forward observations unless I made some other mmistake. I very closely followed the custom metrics and callbacks example class too. I checked and inside my MultiAgentEnvironment and Environments it does work, as the observation is being updated in run.tune, so it must be some issue or design choice with episode.last_observation_for(). I’m therefore not really sure how to extract observations for easy display.

Is it possible that

For the policy this should work. I am trying to extract the action probabilities via .logp and then plot them on a state grid. Unfortunately, not sure how to get something like 95% intervals without manually sampling.