Visualization of learning returns progress

Yasmina_Jaafra · December 31, 2020, 9:30am

Hello,
I am using Ray for the implementation of reinforcement learning (RL) algorithms.
The progress of learning returns visualized by TensorBoard display 4 curves:

ray/tune/episode_len_mean, ray/tune/episode_reward_max, ray/tune/episode_reward_mean and ray/tune/episode_reward_min.

They have different trends so I don’t know which one of them should be considered to track the RL convergence.

Could you please provide the definition of each one of them and advise how should they be interpreted.

Below an example of visualized results:

Peter_Pirog · January 11, 2021, 4:14pm

Hello

episode_len_mean

-mean number of steps before single episode is done, for example for cartpole environment if you have 200 it means that your environment is solved

episode_reward_max

maximum reard from single episod reached during iteration, if you reached maximum value you know that agent can solve the environment

episode_reward_min

minimum reard from single episod reached during iteration, if your minimum is equal maximum it means that the agent can always solve environment

episode_reward_mean

average reward for single iteration, it should increse to maximum value during training

I suggest to use episode_reward_mean, if you now that reward is after every single step you can use episode_len_mean

I hope it will be useful for you.
Peter

Topic		Replies	Views
Learning curves Ray Tune	5	788	March 15, 2022
What's different between episode_return_mean of each iteration and episode_reward?	2	158	October 13, 2024
Custom Tensorboard Metric (episode.total_reward auto generates as mean, min, max) RLlib	5	252	June 24, 2024
How rllib train log the reward on tensorboard? RLlib	1	527	March 25, 2022
Meaning of each column in progress.csv RLlib	1	49	June 12, 2024

Visualization of learning returns progress

Related topics