Hello,
Do you have any idea what I might have been doing wrong? Is the policy_reward_mean too high and should be in the 0 - 1 range?
Thanks in advance
Hello,
Do you have any idea what I might have been doing wrong? Is the policy_reward_mean too high and should be in the 0 - 1 range?
Thanks in advance
Still have no clue with this. Also wondering, what is the best multi-agent algorithm on rllib right now for continuous control ? Seems to be MADDPG however, it is not too stable. MAPPO doesn’t do well and impala is bugged