Extracting and storing per step agent state from RLlib rollouts

Samuel_Showalter · July 21, 2021, 8:56pm

Hi,

I am an ML researcher looking to assess the competency of RL agents. As part of this, when I execute rollouts I want to collect a large amount of information about the agent’s internal state as well as the current observation, action, etc… Moreover, I ideally would want to do this in a way that is agnostic to the RL algorithm (e.g. check if a q function exists and, if it is, take the q_value distribution for a given state-acton tuple).

I have been looking into custom callbacks, but am not sure if this will allow me to accomplish this goal. In short, I want to be able to run rollouts in a distributed setting and have it output a dictionary (indexed by PID and core-specific episode, perhaps) with episode keys and a subdictionary that is indexed by timestep and includes all this information as key-value pairs. Can you help me with this?

Thanks so much in advance.
Sam

michaelzhiluo · July 22, 2021, 9:31am

Perhaps you can add all this information in the Torch policy template: ray/torch_policy_template.py at master · ray-project/ray · GitHub

stats_fn returns a dictionary of the statistics that you need. You can override the function appropriately to include the stuff you said above.

mannyv · July 23, 2021, 1:26pm

Hi @Samuel_Showalter,

Welcome to the forum. It would be easier to answer your question if you could provide an expanded list of the types of information you would want to collect. The standard RL information is pretty easy to extract. But others would be much harder.

As a start you could add the following to your config and then figure out what other kind of information you want to augment it with.
`config[“output”] = “/path/to/store/data/”

This will create a set of json files in that directory with some information. As a start I would recommend only training for a couple steps because the data can become quite large quickly.

Samuel_Showalter · July 23, 2021, 4:44pm

(accidentally posted this before it was complete)
Hi all,

Thanks so much for the ideas. I found a path forward that seems to be working, which is to extend the default callback class and access the internal state of the models by calling the Episode object to get the policy. Then I augment the on_episode_step function to process the observation and then output things like .value_function(), ’get_q_distributions (where applicable),etc.

Are there policy methods that are standard across all agents? i.e. is .value_function() and the action distribution consistent?

Thanks,
Sam

Topic		Replies	Views
Extract and display policy RLlib	3	489	July 26, 2021
How to extract gradient, state, and reward information from trainer.evaluate? RLlib	0	242	April 5, 2022
Seeking recommendations for implementing Dual Curriculum Design in RLlib RLlib	13	669	April 11, 2023
How to store agent's `infos` in the output? RLlib	2	403	August 26, 2021
Save played trajectories in memory RLlib	1	433	August 17, 2022

Extracting and storing per step agent state from RLlib rollouts

Related topics