Pong PPO from tuned example v2.4.0 not converging

Iamgroot · May 18, 2023, 11:07am

How severe does this issue affect your experience of using Ray?

None: Just asking a question out of curiosity

Hi!

I have been trying for the last couple of days to train pong using PPO using the tuned example pong-ppo.yaml from ray release 2.4. Correct me if I am wrong, but I think the file has an issue regarding the rollout_fragment_length. It seems a constraint from algorithm_config.py:2850 raises an exception as rollout_fragment_length * num_rollout_workers * num_envs_per_worker does not match batch_size. I have tried several values which I can’t remember, but all in the low range (<~50), and I still could not do better than -19 after 50M steps (note that I should have stopped way before but I am a noob on this topic). Following a moment of inspiration, I took the atari-ppo.yaml as a comparison. I then changed rollout_fragment_length to match the one from atari i.e. 100 instead of 20 and removed full_action_space and repeat_action_probability
It is doing much better now. So for the questions:

1/ Can you confirm the rollout_fragment_length is equivalent to the number of frames (or steps) seen by the learner ? Could it be that 20 was enough in the original config because of some frameskip ?

2/ I would like to confirm the minimum number of fragment length as an exercise for hyperparameter tuning. Is there a way (like an util script) to export yaml to python config to make sure the baseline of params are the same ?

pong-ppo:
    env: ALE/Pong-v5
    run: PPO
    config:
        # Works for both torch and tf.
        framework: torch
...

=>

config = (  # 1. Configure the algorithm,
    PPOConfig()
    .environment("ALE/Pong-v5")
    .framework("torch")
...

Thanks in advance!

kourosh · May 18, 2023, 8:05pm

Hi @Iamgroot,

We don’t constantly test this config. Our release test covers PPO on Breakout game. Here is the link:

github.com

ray-project/ray/blob/master/release/rllib_tests/learning_tests/yaml_files/ppo/torch/ppo-breakoutnoframeskip-v5-torch.yaml

ppo-breakoutnoframeskip-v5:
    env: ALE/Breakout-v5
    run: PPO
    # Minimum reward and total ts (in given time_total_s) to pass this test.
    pass_criteria:
        sampler_results/episode_reward_mean: 50.0
        timesteps_total: 7000000
    stop:
        # This is double the time we use for tf because of 2x throughput there.
        time_total_s: 7200
    config:
        # Make analogous to old v4 + NoFrameskip.
        env_config:
            frameskip: 1
            full_action_space: false
            repeat_action_probability: 0.0
        lambda: 0.95
        kl_coeff: 0.5
        clip_rewards: True
        clip_param: 0.1

This file has been truncated. show original

Maybe this could be used instead ?

Iamgroot · May 19, 2023, 7:43am

Hi Kourosh

I took the file you suggested and moved from 2 gpus to 1 (as per my machine’s limitations) and it does not look good either… I stopped it before reaching 7M as the test suggest because it does not seem to me like it is going to converge.
Screenshot from 2023-05-19 09-40-16
Do you think changing the gpu count is enough to change the convergence ?

kourosh · May 22, 2023, 9:00pm

I think you need to run it for at least one hour. That’s how long we wait on the release test. Does that not converge for you?

Iamgroot · May 27, 2023, 3:59pm

I managed to make APPO converge. Will continue from there. Thanks for the help !

Topic		Replies	Views
PPO algorithms train buffer only collects the first fragment from each worker? RLlib	4	746	October 30, 2021
PPO only run several steps in one episode RLlib	1	54	September 10, 2024
PPO.train incorrect result RLlib	1	260	May 23, 2023
Running out of CUDA Memory - Sample Batch , rollout fragment length is not helping RLlib	2	949	July 28, 2022
Unable to replicate original PPO performance RLlib	0	177	May 10, 2024

Pong PPO from tuned example v2.4.0 not converging

Related topics