RLlib callbacks to get custom metrics such as observation, reward...etc in each episode from SingleAgentEpisode and access it in the trainer

As part of the project I am working now I need to be able get access to intermediary values such as observations, reward, actions, action probability for each episode and access it in the result dictionary returned by the trainer class when we call .train().
Before SingleAgentEpisode in the new api stack, i.e. EpisodeV2, we were able to access these data like below:

def on_episode_end(self, *, episode: SingleAgentEpisode, **kwargs):
        actions = episode.actions.data
        observations = episode.observations.data
        rewards = episode.rewards.data
        episode.custom_metrics["observations"] = observations
        episode.custom_metrics["actions"]  = actions
        episode.custom_metrics["rewards"] = rewards

But SingleAgentEpisode no longer has the attribute custom metrics which means I can’t access these data in the result. Does anyone know how to access these in the new api stack?

I have exactly the same problem. I found a partial solution using the MetricsLogger available in the new callbacks.
The problem now is that on_episode_start, on_episode_step and on_episode_end, where I need to collect my metrics, returns the MetricsLogger on the env_runner.

I need to do my calculations and preparations for the wandb upload in on_evaluate_end.
But in on_evaluate_end the MetricsLogger object of the algorithm class is returned.
If this would hold the objects I logged in the on_episode_step callback, everything would be fine. But the metrics seem to be reduced even if reduce=None is set while logging the metrics in on_episode_step.

So maybe this gives you a further clue or somebody with more knowledge can step in and explain how it is supposed to work.

1 Like

Solved the problem:

You can using the MetricsLogger in any of the callbacks

metrics_logger.log_value(("myData",infos[0].get('evalEnvID', 0)+1), infos, reduce=None, clear_on_reduce=True)

Then you can get back your data in the on_evaluate_end callback with

data = evaluation_metrics["env_runners"]["myData"]

.
I think it will be reduced unless you specifiy reduce=None during logging.

1 Like