I am an ML researcher looking to assess the competency of RL agents. As part of this, when I execute rollouts I want to collect a large amount of information about the agent’s internal state as well as the current observation, action, etc… Moreover, I ideally would want to do this in a way that is agnostic to the RL algorithm (e.g. check if a q function exists and, if it is, take the q_value distribution for a given state-acton tuple).
I have been looking into custom callbacks, but am not sure if this will allow me to accomplish this goal. In short, I want to be able to run rollouts in a distributed setting and have it output a dictionary (indexed by PID and core-specific episode, perhaps) with episode keys and a subdictionary that is indexed by timestep and includes all this information as key-value pairs. Can you help me with this?
Thanks so much in advance.