I have noticed that many of the PettingZoo gyms environments have very long episodes. The agents receive rewards for points scored along the way, but the environment doesn’t terminate or reset until a long time has passed. This creates a problem for RLLib training because the metrics are not calculated until an episode has completed. But PettingZoo gyms can go for long periods without ending an episode meaning that after a training iteration you may see no metrics and thus no update to the policies.
I am trying to use PettingZoo with RLLib and RLLib has a wrapper that needs to be used with PettingZoo gyms from from ray.rllib.env import PettingZooEnv
. I am wondering if there was a way to reset() the environment properly with each reward given. However, simply changing the terminated and truncated to True and resetting the environment in the wrapper for RLLib’s PettingZoo code does not work.
Does anyone have any suggestions on how to solve this issue?