As part of the project I am working now I need to be able get access to intermediary values such as observations, reward, actions, action probability for each episode and access it in the result dictionary returned by the trainer class when we call .train().
Before SingleAgentEpisode in the new api stack, i.e. EpisodeV2, we were able to access these data like below:
But SingleAgentEpisode no longer has the attribute custom metrics which means I can’t access these data in the result. Does anyone know how to access these in the new api stack?
I have exactly the same problem. I found a partial solution using the MetricsLogger available in the new callbacks.
The problem now is that on_episode_start, on_episode_step and on_episode_end, where I need to collect my metrics, returns the MetricsLogger on the env_runner.
I need to do my calculations and preparations for the wandb upload in on_evaluate_end.
But in on_evaluate_end the MetricsLogger object of the algorithm class is returned.
If this would hold the objects I logged in the on_episode_step callback, everything would be fine. But the metrics seem to be reduced even if reduce=None is set while logging the metrics in on_episode_step.
So maybe this gives you a further clue or somebody with more knowledge can step in and explain how it is supposed to work.