My goal is to create some nice graphs of various metrics as the training progresses. My approach to this was to save all the metrics I care about in the “info” dict that env.step() returns, and then inside DomCallbacks.on_episode_end() I saved the metrics to episode.hist_data. This seems kind of circuitous to me, plus also my data isn’t really histogram data, but when I tried to save it in episode.custom_metrics, it later only saved the min, mean, and max of my metrics.
Here’s my callback:
class DomCallbacks(DefaultCallbacks):
def on_episode_end(self, *, worker: RolloutWorker, base_env: BaseEnv,
policies: Dict[str, Policy], episode: MultiAgentEpisode,
env_index: int, **kwargs):
info = episode._agent_to_last_info
episode.hist_data['game_len'] = [max(info['player_1']['num_turns'], info['player_2']['num_turns'])]
# info is a nested dict, so we flatten it when copying to hist_data
for player in info:
for metric in info[player]:
if isinstance(info[player][metric], dict):
for m in info[player][metric]:
key = player + '_' + metric + '_' + m
episode.hist_data[key] = [info[player][metric][m]]
else:
key = player + '_' + metric
episode.hist_data[key] = [info[player][metric]]
Is this the intended way to do what I want?