I am using Ray for the implementation of reinforcement learning (RL) algorithms.
The progress of learning returns visualized by TensorBoard display 4 curves:
ray/tune/episode_len_mean, ray/tune/episode_reward_max, ray/tune/episode_reward_mean and ray/tune/episode_reward_min.
They have different trends so I don’t know which one of them should be considered to track the RL convergence.
Could you please provide the definition of each one of them and advise how should they be interpreted.
Below an example of visualized results: