Hello,
I’m encountering an issue while training MADDPG using RLlib version 2.7. Specifically, the number of episodes reported in episodes_this_iter
does not match the values in hist_stats/episode_lengths
. Here’s a concrete example:
episodes_this_iter
is consistently reporting 5 episodes per iteration.hist_stats/episode_lengths
, however, is reporting a list of episode lengths like[20, 20, 20, 20, 20, 20]
, which gradually increases during training and eventually stabilizes to a constant length list.
This seems to indicate that the total number of episodes being tracked by hist_stats/episode_lengths
is higher than the value in episodes_this_iter
. Additionally, the length of episodes reported in hist_stats/episode_lengths
increases over time before stabilizing to a fixed number, which seems inconsistent with the number of episodes reported by episodes_this_iter
.
Some context about my setup:
- I am using MADDPG in a multi-agent environment.
- The training is running with parallel environments.
- The agents have different termination conditions, and some episodes may terminate earlier than others.
- RLlib version: 2.7
Example:
episodes_this_iter = 5
hist_stats/episode_lengths = [20, 20, 20, 20, 20, 20]
initially, which grows before stabilizing.
Is there any reason for this mismatch? Could this be related to the way episode lengths are being tracked or the environment’s behavior? How can I ensure that the episode counts in episodes_this_iter
match those in hist_stats/episode_lengths
?
I would appreciate any insights or suggestions to resolve this.
Thank you!