Rllib example got Nan during execution while stable_baselines works fine

Hi, I and running the pettingzoo example mentioned at the RLlib example lists. This example using PPO to find a shared policy in a continous action space.

I open an issue in the PettingZoo repo but no response got yet. The detail description is also included there. Note that I tried to config the Rllib as StableBaselines, but still failed.

Thank u for your help in advance.