Strange stuck in PPO algorithm training process

How severe does this issue affect your experience of using Ray?

  • None: Just asking a question out of curiosity
  • Low: It annoys or frustrates me for a moment.
  • Medium: It contributes to significant difficulty to complete my task, but I can work around it.
  • High: It blocks me to complete my task.

When I train the agent in path planning using the PPO algorithm,many times the program would be stuck in a special stage (training stage), constantly displaying the current status without taking steps, as described in the following text:

== Status ==
Current time: 2022-06-22 15:11:56 (running for 03:47:17.10)
Memory usage on this node: 28.3/63.9 GiB
Using FIFO scheduling algorithm.
Resources requested: 8.0/12 CPUs, 1.0/1 GPUs, 0.0/28.32 GiB heap, 0.0/14.16 GiB objects
Result logdir: C:\Users\hawk\ray_results\PPO
Number of trials: 1/1 (1 RUNNING)
| Trial name | status | loc | iter | total time (s) | ts | reward | episode_reward_max | episode_reward_min | episode_len_mean |
| PPO | RUNNING | | 2 | 10082.7 | 40936 | 2.53142 | 66.7758 | -42.5759 | 53.0237 |

Hi @hawk,

Those metrics do not update between iterations but time prints out the last values every few seconds. Much more often than your training iteration. PPO has an outer and inner training loop and the default value for the inner loop is large (30). For my setup that makes each iteration take forever so I reduce it. You could double check this is the issue but seeing it to one. You should see that iterations update much more frequently. Then you can adjust that value to find a tradeoff between training time and sgd updates that is acceptable for your use case. The config parameter you want to change is num_sgd_iter.

1 Like

Here is my config
config = {
“create_env_on_driver”: True,
#“lr”: 0.0005,

    "num_workers": 7,
    "lambda": 0.98,
    "gamma": 0.995,
    "sgd_minibatch_size": 256,
    "train_batch_size": 10240,
    # Use GPUs iff `RLLIB_NUM_GPUS` env var set to > 0.
    #"num_gpus": int(os.environ.get("RLLIB_NUM_GPUS", "0")),
    "num_gpus" :1,
    #"num_gpus_per_worker": 0.1,
    "num_sgd_iter": 8,
    "rollout_fragment_length": 64,
    "clip_param": 0.25,
    "observation_filter" : "MeanStdFilter",
    # Multi-agent setup for the particular env.
    "multiagent": {
        "policies": {"shared_policy"},
        "policy_mapping_fn": (lambda agent_id, episode, **kwargs: "shared_policy"),
    "model": {
        "fcnet_hiddens": [512, 256, 64],
    "callbacks": MyCallbacks,  
    "framework": "tf" ,
    "no_done_at_end": True,


Thank you for your reply.
I feel it is not this problem, because I have already set the num_sgd_iter to a small size(8), but the program print the status message repeatedly, and there is no response from the agent(The agent does not take any action).

I do have a finicky environment that someone hangs. So I understand it could be the environment too. I just thought I would share one common cause of slow training.

The agent won’t take any actions during the update step of an iteration so that is not unexpected.

Two thoughts for things you could try.

One is to replace your environment with the RLlib mock environment. That will be configured to have the same input and output space but it will use random data. Does it still hang with a mock env.

The second is you could edit the source code and add a print statement in the PPO loss function. Then you will know if you are in the sgd update or not. Word of warning, since you are using the tf static graphs a normal print probably won’t work. You will probably have to use tf.Print in a tf.control_dependencies context manager.

One last thing. I have one setup where the GPU is significantly slower than using a CPU. A single iteration takes 5 minutes with a CPU and 50 minutes with a GPU.

1 Like