Pong PPO from tuned example v2.4.0 not converging

I have been trying for the last couple of days to train pong using PPO using the tuned example pong-ppo.yaml from ray release 2.4. Correct me if I am wrong, but I think the file has an issue regarding the rollout_fragment_length. It seems a constraint from algorithm_config.py:2850 raises an exception as rollout_fragment_length * num_rollout_workers * num_envs_per_worker does not match batch_size. I have tried several values which I can’t remember, but all in the low range (<~50), and I still could not do better than -19 after 50M steps (note that I should have stopped way before but I am a noob on this topic). Following a moment of inspiration, I took the atari-ppo.yaml as a comparison. I then changed rollout_fragment_length to match the one from atari i.e. 100 instead of 20 and removed full_action_space and repeat_action_probability
It is doing much better now. So for the questions:

1/ Can you confirm the rollout_fragment_length is equivalent to the number of frames (or steps) seen by the learner ? Could it be that 20 was enough in the original config because of some frameskip ?

2/ I would like to confirm the minimum number of fragment length as an exercise for hyperparameter tuning. Is there a way (like an util script) to export yaml to python config to make sure the baseline of params are the same ?

    env: ALE/Pong-v5
    run: PPO
        # Works for both torch and tf.
        framework: torch


config = (  # 1. Configure the algorithm,

Thanks in advance!

Hi @Iamgroot,

We don’t constantly test this config. Our release test covers PPO on Breakout game. Here is the link:

Maybe this could be used instead ?

Hi Kourosh

I took the file you suggested and moved from 2 gpus to 1 (as per my machine’s limitations) and it does not look good either… I stopped it before reaching 7M as the test suggest because it does not seem to me like it is going to converge.
Do you think changing the gpu count is enough to change the convergence ?

I think you need to run it for at least one hour. That’s how long we wait on the release test. Does that not converge for you?

I managed to make APPO converge. Will continue from there. Thanks for the help !