[Impala] I'm getting a zero vf_loss, policy_loss after only 16k steps


Do you have any idea what I might have been doing wrong? Is the policy_reward_mean too high and should be in the 0 - 1 range?

Thanks in advance

Still have no clue with this. Also wondering, what is the best multi-agent algorithm on rllib right now for continuous control ? Seems to be MADDPG however, it is not too stable. MAPPO doesn’t do well and impala is bugged