Pong PPO from tuned example v2.4.0 not converging

How severe does this issue affect your experience of using Ray?

  • None: Just asking a question out of curiosity


I have been trying for the last couple of days to train pong using PPO using the tuned example pong-ppo.yaml from ray release 2.4. Correct me if I am wrong, but I think the file has an issue regarding the rollout_fragment_length. It seems a constraint from algorithm_config.py:2850 raises an exception as rollout_fragment_length * num_rollout_workers * num_envs_per_worker does not match batch_size. I have tried several values which I can’t remember, but all in the low range (<~50), and I still could not do better than -19 after 50M steps (note that I should have stopped way before but I am a noob on this topic). Following a moment of inspiration, I took the atari-ppo.yaml as a comparison. I then changed rollout_fragment_length to match the one from atari i.e. 100 instead of 20 and removed full_action_space and repeat_action_probability
It is doing much better now. So for the questions:

1/ Can you confirm the rollout_fragment_length is equivalent to the number of frames (or steps) seen by the learner ? Could it be that 20 was enough in the original config because of some frameskip ?

2/ I would like to confirm the minimum number of fragment length as an exercise for hyperparameter tuning. Is there a way (like an util script) to export yaml to python config to make sure the baseline of params are the same ?

    env: ALE/Pong-v5
    run: PPO
        # Works for both torch and tf.
        framework: torch


config = (  # 1. Configure the algorithm,

Thanks in advance!

Hi @Iamgroot,

We don’t constantly test this config. Our release test covers PPO on Breakout game. Here is the link:

Maybe this could be used instead ?

Hi Kourosh

I took the file you suggested and moved from 2 gpus to 1 (as per my machine’s limitations) and it does not look good either… I stopped it before reaching 7M as the test suggest because it does not seem to me like it is going to converge.
Screenshot from 2023-05-19 09-40-16
Do you think changing the gpu count is enough to change the convergence ?

I think you need to run it for at least one hour. That’s how long we wait on the release test. Does that not converge for you?

I managed to make APPO converge. Will continue from there. Thanks for the help !