Cumulative reward chart

ironv · July 25, 2021, 4:11am

I am trying to create a cumulative reward vs time steps chart to compare algorithms. Is there a snippet of code which I can look at to do this? Does rllib have support for this? As I look at stable_baselines3, I see methods like evaluate_policy. I was hoping rllib has something similar or somebody can point me to an example. Thx.

Lars_Simon_Zehnder · July 25, 2021, 2:13pm

@ironv take a look at Offline Datasets. Therewith, you can store the SampleBatches of your experiments (simply use "output": "path/to/your/data" in your trainer configuration.

Then, with the RLlib JsonReader you can read in these files and convert to an iterable. Further with the helper functions in sample_batch.py, you are able to extract exactly the information you need, namely the rewards and the timesteps t. Hope that helps amigo

ironv · July 25, 2021, 4:45pm

Thanks @Lars_Simon_Zehnder Suppose I want to compare two trained policies (based on two different algorithms or different params of the same algo), is there an easier way to do this just using the saved policies? If possible, can you point me to an example? Thx.

Lars_Simon_Zehnder · July 26, 2021, 7:11am

Hi @ironv,
on the documentation page about Training APIs you find a subsection named Evaluating trained Policies and therein is shown a way how to evaluate trained policies with TensorBoard:

tensorboard --logdir=~/ray_results

Under the subsection Callbacks and Custom Metrics there is also described how to define your own custom metrics to be shown in TensorBoard. Hope these links help.

Topic		Replies	Views
Reporting Custom Metrics From Policy_Clients RLlib	0	258	November 12, 2021
Use Policy_Trainer with TensorBoard RLlib	33	2303	November 13, 2021
[RLlib] Policy evaluation using CLI and Python API produce different reward distributions RLlib	2	269	August 2, 2023
How rllib train log the reward on tensorboard? RLlib	1	526	March 25, 2022
ICM - Curiosity Reward Scale RLlib	3	508	May 16, 2022

Cumulative reward chart

Related topics