PPO forgetting some good actions

antonioaa1979 · November 25, 2022, 9:59pm

Hi,

I just trying to understand why this happens. I am training PPO with a custom deterministic environment. For each iteration (training rollout) I typically have 8 episodes, for which i am plotting min/max/mean reward. I am not sure to understand why PPO, once discovers some good actions leading to an highest max rewards, soon after forgets those and comes back to actions leading to a smaller reward (see below chart as an example). Any help in understanding this behavior and which hyperparameters i should play with, to address it? (I am using pretty much default values for all hyperparameters currently)

Thanks,
Antonio

arturn · November 30, 2022, 9:28pm

This is nothing unusual. ANNs can forget for many reasons. Generally, you simply want to checkpoint often enough to capture policies that perform good. Also you can take measures to ensure training stability, like using a learning rate that is small enough, don’t evaluate asynchronously, clip gradients etc.

Topic		Replies	Views
PPO: Value estimate off for goal state Debugging and performance tuning	0	14	September 18, 2024
PPO only run several steps in one episode RLlib	1	39	September 10, 2024
SAC Agent 'Forgets' During Training RLlib	5	294	September 13, 2022
PPO trainer eating up memory RLlib	9	2319	April 2, 2021
PPO order of actions/obs/rewards scrambled RLlib	1	471	January 15, 2022

PPO forgetting some good actions

Related topics