Tracking env observations from a single env during training

pg13 · September 12, 2025, 10:31am

I am trying to collect custom metrics from my env and display them on tensorboard during training. I am working with this example ray/rllib/examples/custom_metrics_and_callbacks.py and it is working well. Now I also would like to track the observations from my environment, to get an overview during training. But I can’t seem to figure out how to filter the observations, such that I only display them for a single episode/env. Meaning it shouldn’t be a callback for a single episode but rather a callback further up, allowing me to select a single episode.
I hope this makes sense.
Would very much appreciate if somebody has done something similar and could share.

MCW_Lad · September 12, 2025, 10:08pm

I’m not entirely sure what you mean, but from what you said, I think the Self Play callbacks and their associated metrics (in the multiagent examples section) might be helpful.

pg13 · September 19, 2025, 2:46pm

I might have phrased it ambiguously. I will try again, now that I’ve done further testing.
In the following RLlibCallback subclass, I’m logging environment info after each step. On episode end, I therefore have a time series for all info dict entries.

class CustomCallback(RLlibCallback):

def on_episode_step(self, *args, episode: SingleAgentEpisode, metrics_logger, **kwargs):
        metrics_logger.log_dict(
            episode.infos.data[-1], reduce=None, clear_on_reduce=True
        )

If I then open tensorboard, each info is displayed as a histogram with the datapoints from all the environments used as separate distributions overlayed into one plot (so one histogram plot, e.g. 10 distributions). This doesn’t make sense.
Instead, I’d like to have each information as a scalar time series plot for a single episode. Doesn’t matter which. Kind of like a visualization of an example episode to see if everything is running as expected. It’s much nicer than logging the values, which I also do.
But I don’t get how I can make sure only one episode info is passed to tensorboard and the info is treated as a time series.

MCW_Lad · September 25, 2025, 9:46pm

I’m still not entirely clear on what you mean, but that sounds like a TensorBoard thing rather than an RLlib thing. I used wandb for something like this and got time-series plots, from my custom metric (a dictionary of values recorded once every episode, populated the same way as in the examples) - if that’s your goal, I’d try wandb logging and see if it looks right - if it does, then all you’d need to do is tweak the TensorBoard settings, because that’d mean that the data you’re sending is correct and the only issue is how it’s being interpreted on the visualization library’s end.

Topic		Replies	Views
Episode user_data or Episode metrics not showing up in Tensorboard Configure Algorithm, Training, Evaluation, Scaling	2	591	March 23, 2023
Custom Tensorboard Metric (episode.total_reward auto generates as mean, min, max) RLlib	5	385	June 24, 2024
Possible to access default logger from environment? RLlib	15	1514	April 27, 2021
How to obtain single episode reward? RLlib	6	1519	March 19, 2024
Logging stuff in a custom gym environment using RLlib and Tune RLlib	4	1488	June 1, 2022

Tracking env observations from a single env during training

Related topics