Tracking env observations from a single env during training

I am trying to collect custom metrics from my env and display them on tensorboard during training. I am working with this example ray/rllib/examples/custom_metrics_and_callbacks.py and it is working well. Now I also would like to track the observations from my environment, to get an overview during training. But I can’t seem to figure out how to filter the observations, such that I only display them for a single episode/env. Meaning it shouldn’t be a callback for a single episode but rather a callback further up, allowing me to select a single episode.
I hope this makes sense.
Would very much appreciate if somebody has done something similar and could share.

I’m not entirely sure what you mean, but from what you said, I think the Self Play callbacks and their associated metrics (in the multiagent examples section) might be helpful.

1 Like

I might have phrased it ambiguously. I will try again, now that I’ve done further testing.
In the following RLlibCallback subclass, I’m logging environment info after each step. On episode end, I therefore have a time series for all info dict entries.

class CustomCallback(RLlibCallback):

def on_episode_step(self, *args, episode: SingleAgentEpisode, metrics_logger, **kwargs):
        metrics_logger.log_dict(
            episode.infos.data[-1], reduce=None, clear_on_reduce=True
        )

If I then open tensorboard, each info is displayed as a histogram with the datapoints from all the environments used as separate distributions overlayed into one plot (so one histogram plot, e.g. 10 distributions). This doesn’t make sense.
Instead, I’d like to have each information as a scalar time series plot for a single episode. Doesn’t matter which. Kind of like a visualization of an example episode to see if everything is running as expected. It’s much nicer than logging the values, which I also do.
But I don’t get how I can make sure only one episode info is passed to tensorboard and the info is treated as a time series.

I’m still not entirely clear on what you mean, but that sounds like a TensorBoard thing rather than an RLlib thing. I used wandb for something like this and got time-series plots, from my custom metric (a dictionary of values recorded once every episode, populated the same way as in the examples) - if that’s your goal, I’d try wandb logging and see if it looks right - if it does, then all you’d need to do is tweak the TensorBoard settings, because that’d mean that the data you’re sending is correct and the only issue is how it’s being interpreted on the visualization library’s end.