RuntimeWarning: Mean of empty slice with TensorFlow multi-agent PPO

Hello, I have been struggling with this for days so I would really appreciate if someone could help me figure this out! I am not too familiar with the interface here yet, so I have created a question here: numpy - RLlib PPO reward flat-lines with RuntimeWarning: Mean of empty slice - Stack Overflow

Would appreciate any guidance!