Inconsistency between `episodes_this_iter` and `hist_stats/episode_lengths` in MADDPG Training (RLlib 2.7)

Hello,

I’m encountering an issue while training MADDPG using RLlib version 2.7. Specifically, the number of episodes reported in episodes_this_iter does not match the values in hist_stats/episode_lengths. Here’s a concrete example:

  • episodes_this_iter is consistently reporting 5 episodes per iteration.
  • hist_stats/episode_lengths, however, is reporting a list of episode lengths like [20, 20, 20, 20, 20, 20], which gradually increases during training and eventually stabilizes to a constant length list.

This seems to indicate that the total number of episodes being tracked by hist_stats/episode_lengths is higher than the value in episodes_this_iter. Additionally, the length of episodes reported in hist_stats/episode_lengths increases over time before stabilizing to a fixed number, which seems inconsistent with the number of episodes reported by episodes_this_iter.

Some context about my setup:

  • I am using MADDPG in a multi-agent environment.
  • The training is running with parallel environments.
  • The agents have different termination conditions, and some episodes may terminate earlier than others.
  • RLlib version: 2.7

Example:

  • episodes_this_iter = 5
  • hist_stats/episode_lengths = [20, 20, 20, 20, 20, 20] initially, which grows before stabilizing.

Is there any reason for this mismatch? Could this be related to the way episode lengths are being tracked or the environment’s behavior? How can I ensure that the episode counts in episodes_this_iter match those in hist_stats/episode_lengths?

I would appreciate any insights or suggestions to resolve this.

Thank you!