[Impala] I'm getting a zero vf_loss, policy_loss after only 16k steps

Clement_Collgon · May 4, 2021, 2:13pm

Hello,

Do you have any idea what I might have been doing wrong? Is the policy_reward_mean too high and should be in the 0 - 1 range?

Thanks in advance

Clement_Collgon · May 6, 2021, 8:55am

Still have no clue with this. Also wondering, what is the best multi-agent algorithm on rllib right now for continuous control ? Seems to be MADDPG however, it is not too stable. MAPPO doesn’t do well and impala is bugged

Topic		Replies	Views
Impala Bugs and some other observations RLlib	9	1177	April 27, 2023
Agent consistently stops improving at the same point, despite not appearing to be in a local maxima RLlib	2	109	July 14, 2025
Ray 1.6.0 Impala multiagent, PolicyID 'default_policy' not found in this PolicyMap RLlib	1	436	July 25, 2022
Periodic spikes in vf_loss of PPO training RLlib	0	335	January 14, 2022
Entropy value in IMPALA RLlib	8	846	April 21, 2021

[Impala] I'm getting a zero vf_loss, policy_loss after only 16k steps

Related topics