Cumulative reward chart

I am trying to create a cumulative reward vs time steps chart to compare algorithms. Is there a snippet of code which I can look at to do this? Does rllib have support for this? As I look at stable_baselines3, I see methods like evaluate_policy. I was hoping rllib has something similar or somebody can point me to an example. Thx.

@ironv take a look at Offline Datasets. Therewith, you can store the SampleBatches of your experiments (simply use "output": "path/to/your/data" in your trainer configuration.

Then, with the RLlib JsonReader you can read in these files and convert to an iterable. Further with the helper functions in sample_batch.py, you are able to extract exactly the information you need, namely the rewards and the timesteps t. Hope that helps amigo

2 Likes

Thanks @Lars_Simon_Zehnder Suppose I want to compare two trained policies (based on two different algorithms or different params of the same algo), is there an easier way to do this just using the saved policies? If possible, can you point me to an example? Thx.

Hi @ironv,
on the documentation page about Training APIs you find a subsection named Evaluating trained Policies and therein is shown a way how to evaluate trained policies with TensorBoard:

tensorboard --logdir=~/ray_results

Under the subsection Callbacks and Custom Metrics there is also described how to define your own custom metrics to be shown in TensorBoard. Hope these links help.