Visualization of learning returns progress

I am using Ray for the implementation of reinforcement learning (RL) algorithms.
The progress of learning returns visualized by TensorBoard display 4 curves:

ray/tune/episode_len_mean, ray/tune/episode_reward_max, ray/tune/episode_reward_mean and ray/tune/episode_reward_min.

They have different trends so I don’t know which one of them should be considered to track the RL convergence.

Could you please provide the definition of each one of them and advise how should they be interpreted.

Below an example of visualized results:



-mean number of steps before single episode is done, for example for cartpole environment if you have 200 it means that your environment is solved


  • maximum reard from single episod reached during iteration, if you reached maximum value you know that agent can solve the environment


  • minimum reard from single episod reached during iteration, if your minimum is equal maximum it means that the agent can always solve environment


  • average reward for single iteration, it should increse to maximum value during training

I suggest to use episode_reward_mean, if you now that reward is after every single step you can use episode_len_mean

I hope it will be useful for you.