Is this the recommended way of saving per-episode training metrics?

My goal is to create some nice graphs of various metrics as the training progresses. My approach to this was to save all the metrics I care about in the “info” dict that env.step() returns, and then inside DomCallbacks.on_episode_end() I saved the metrics to episode.hist_data. This seems kind of circuitous to me, plus also my data isn’t really histogram data, but when I tried to save it in episode.custom_metrics, it later only saved the min, mean, and max of my metrics.

Here’s my callback:

class DomCallbacks(DefaultCallbacks):
    def on_episode_end(self, *, worker: RolloutWorker, base_env: BaseEnv,
                       policies: Dict[str, Policy], episode: MultiAgentEpisode,
                       env_index: int, **kwargs):
        info = episode._agent_to_last_info
        episode.hist_data['game_len'] = [max(info['player_1']['num_turns'], info['player_2']['num_turns'])]

        # info is a nested dict, so we flatten it when copying to hist_data
        for player in info:
            for metric in info[player]:
                if isinstance(info[player][metric], dict):
                    for m in info[player][metric]:
                        key = player + '_' + metric + '_' + m
                        episode.hist_data[key] = [info[player][metric][m]]
                else:
                    key = player + '_' + metric
                    episode.hist_data[key] = [info[player][metric]]

Is this the intended way to do what I want?

1 Like