Why the episode reward mean is always the same number for a while(about 10 iters)?

bug404 · May 2, 2021, 9:56am

My env returns -0.1 at the most time, and sometimes it will return 1 as the reward. But for a while, about 10 iters, the episode reward mean was the same number, why? I don’t think the episode reward mean doesn’t change for 10 iters, because the reward is -0.1 at the most time.

sven1977 · May 4, 2021, 1:57pm

Maybe your agent hasn’t learnt to reach the goal (+1) yet? This is a typical behavior for grid worlds where with per-step=-0.1 reward and some positive goal reward (along with episode termination).
Sounds completely normal.
Btw, the reported value under episode_reward_mean is the average reward from the train_batch used in that iteration.

Topic		Replies	Views
Episode_reward_mean same across different episodes in continuous environment RLlib	7	1000	August 30, 2021
Constant episode_reward_mean over training, even setting horizon parameter RLlib	3	46	December 5, 2024
Meaning of episode_reward_mean RLlib	10	4166	September 21, 2023
How rllib train log the reward on tensorboard? RLlib	1	526	March 25, 2022
What's different between episode_return_mean of each iteration and episode_reward?	2	152	October 13, 2024

Why the episode reward mean is always the same number for a while(about 10 iters)?

Related topics