Reproducing MADDPG MPE Training Results

Mark_Zhang · October 14, 2021, 10:40pm

Hi guys,

I am trying to get myself familiar with the rllib-MADDPG, so the first thing I did was to reproduce the MPE training results and compare the results with those from the OpenAI’s implementation (i.e., same as this page did).

However, I couldn’t get matching results as shown in the previous link using the “simple_spread” MPE scenario. See Case I vs. Case II in the figure below: in six different trials, Case II’s learning curves seem to be less stable, with four of the runs learn with decreasing reward level and the other two look okay.

With a little investigation, I found out that if we use libraries from their earlier version (found here, three most important ones are highlighted in the figure), the learning curves show more stable and consistent learning.

I checked the MPE env I used is exactly the same, also the definition of losses of actors and critics in maddpg policy and all hyper-parameters used are also the same, so not sure why using the more recent rllib version, the learning shows a different behavior. Can anyone please provide some pointers or possible explanations for this? I would love to gain more understanding on this and possibly fix this issue. Thanks!

rusu24edward · October 15, 2021, 3:45pm

Awesome investigation! I wonder if a breaking change was introduced in RLlib’s newer versions. Anyone know if MPE training results are included in the test suite?

Topic		Replies	Views
Unable to replicate original PPO performance RLlib	0	205	May 10, 2024
Getting started rllib - deprecated? RLlib	1	31	October 16, 2025
Issues reproducing stable-baselines3 PPO performance with rllib RLlib	14	2696	March 16, 2022
Reproducing ML-Agents Results with RLlib? Configure Algorithm, Training, Evaluation, Scaling	3	381	May 29, 2024
Muesli Implementation RLlib	1	859	May 4, 2021

Reproducing MADDPG MPE Training Results

Related topics