I have to replicate the results of the original paper of MBMPO algorithm. I see there is a HalfCheetahWrapper environment for MBMPO but it seems to be using v2 which is not supported anymore. I want to know the correct version of the ray, gym/gymnasium, tensorflow, mujoco and mujoco_py that were used for producing the result. I have made a wrapper of my own but it does not produces progress.csv file and the results of training (min_reward, max_reward, etc) are all “nan”. Can someone please help.
Could you provide a reproduction script for what you’re seeing?
The latest ray release ray==2.4.0
should be compatible with gymnasium=0.26.3
, so you should be able to use HalfCheetah-v4
; you’ll have to modify the HalfCheetahWrapper since currently it uses HalfCheetah-v2.
Note that RLlib only supports MBMPO for torch, not tensorflow: Algorithms — Ray 2.4.0