Unexpected dramatic drop in reward

Hi all,
I trained a PPO agent for my custom env. Everything was working pretty well but suddenly the reward dropped and never recovered again. As you see in the following figure the agent almost perfectly learned my env (10 is the maximum reward in my custom env).

I expected the reward curve gets plateau somewhere after the blue line, but as you see in the following figure it dramatically dropped!

I wonder do you think is it a Ray/RLlib’s issue? or could be related to my CUDA or something else?


Is it possible in your custom environment for the agents to receive such low rewards within an episode? If not, then you’re probably seeing a bug.

@rusu24edward , thanks for your reply

Hey @deepgravity , could you check your model’s weights? Maybe they have collapsed/exploded/NaN’d after some learning update?

Hi @sven1977 , thanks for your reply. I am actually no longer faced with this issue. But I do not know how I fixed it. I indeed changed many things in my custom env and also agent training pipeline. So, not sure what was the main reason for the error. Anyway, now everything works pretty well :slight_smile: