Periodic spikes in vf_loss of PPO training

@sven1977 What can I interpret from the periodic spikes in vf_loss while training a PPO Policy? As you can see in the following screenshot, I keep getting periodic spikes even upto 100M timesteps. What does these spikes signifies? Is this the ideal representation of vf_loss or the spikes should not be present?

1 Like