Hey, folks,
I was training a PPO agent with a customized environment. I set up the training so that it is supposed to stop when 20k episodes are completed. However, I found that the process was always hanging after certain steps and will never be able to exit. Any ideas why and how to fix this? Thanks!
Here is my config:
ppo_config = (
PPOConfig()
.framework("torch")
.rollouts(create_env_on_local_worker=True, num_rollout_workers=4)
.debugging(seed=0, log_level="WARN")
.training(model={"fcnet_hiddens" : [16, 16], "custom_model": "custom_ppo_model"},
#train_batch_size = 11*num_episodes_per_batch,
#grad_clip = 10,
#clip_param = 0.1,
#vf_clip_param = 2,
vf_loss_coeff = 1.0,
kl_coeff = 0.0,
entropy_coeff = 0.01,
lr=1e-5,
lambda_=0.85,
gamma=0.99)
.evaluation(
evaluation_interval=1,
evaluation_config=PPOConfig.overrides(explore=True),
)
.environment(env=NCO_ENV,env_config=env_config)
.experimental(
# _enable_new_api_stack=True,
_disable_preprocessor_api=True,
)
# .resources(num_learner_workers=2, num_cpus_per_worker=2)
# .rl_module(rl_module_spec=rlm_spec)
)
Here are some metrics I logged during the training.
2024-05-19 01:33:01,146 - root - INFO - Training Step: 42 Total episodes trained: 15272
2024-05-19 01:33:01,146 - root - INFO - policy_loss: 1.2008193078180474
2024-05-19 01:33:01,147 - root - INFO - learner_stats:{'allreduce_latency': 0.0, 'grad_gnorm': 20.37426555695072, 'cur_kl_coeff': 0.0, 'cur_lr': 9.999999999999999e-06, 'total_loss': 1.4067846547692053, 'policy_loss': 1.2008193078180474, 'vf_loss': 0.20599679135346927, 'vf_explained_var': 0.9453324226922887, 'kl': 0.0, 'entropy': 0.003144154228991078, 'entropy_coeff': 0.01}
2024-05-19 01:33:01,147 - root - INFO - episode_reward_mean:6.626979846071101
2024-05-19 01:35:00,439 - root - INFO - Training Step: 43 Total episodes trained: 15636
2024-05-19 01:35:00,441 - root - INFO - policy_loss: 1.1391864486279026
2024-05-19 01:35:00,441 - root - INFO - learner_stats:{'allreduce_latency': 0.0, 'grad_gnorm': 19.801141237443492, 'cur_kl_coeff': 0.0,'cur_lr': 9.999999999999999e-06, 'total_loss': 1.3345247489029681, 'policy_loss': 1.1391864486279026, 'vf_loss': 0.1953549200488675, 'vf_explained_var': 0.9470342062493806, 'kl': 0.0, 'entropy': 0.0016611376917490396, 'entropy_coeff': 0.01}
2024-05-19 01:35:00,441 - root - INFO - episode_reward_mean:6.673998273844872
2024-05-19 01:37:00,547 - root - INFO - Training Step: 44 Total episodes trained: 16000
2024-05-19 01:37:00,549 - root - INFO - policy_loss: 1.1743851169463126
2024-05-19 01:37:00,549 - root - INFO - learner_stats:{'allreduce_latency': 0.0, 'grad_gnorm': 20.133241493471207, 'cur_kl_coeff': 0.0,'cur_lr': 9.999999999999999e-06, 'total_loss': 1.3602027489293007, 'policy_loss': 1.1743851169463126, 'vf_loss': 0.18581761958137635, 'vf_explained_var': 0.9493927434567482, 'kl': 0.0, 'entropy': 0.0, 'entropy_coeff': 0.01}
2024-05-19 01:37:00,549 - root - INFO - episode_reward_mean:6.634718246296934
2024-05-19 01:39:01,843 - root - INFO - Training Step: 45 Total episodes trained: 16360
2024-05-19 01:39:01,844 - root - INFO - policy_loss: 1.2602679940962023
2024-05-19 01:39:01,844 - root - INFO - learner_stats:{'allreduce_latency': 0.0, 'grad_gnorm': 21.215875837879796, 'cur_kl_coeff': 0.0,'cur_lr': 9.999999999999999e-06, 'total_loss': 1.4378114342048605, 'policy_loss': 1.2602679940962023, 'vf_loss': 0.1775434453480987, 'vf_explained_var': 0.9508468215183545, 'kl': 0.0, 'entropy': 0.0, 'entropy_coeff': 0.01}
2024-05-19 01:39:01,844 - root - INFO - episode_reward_mean:6.6528128772941875
2024-05-19 01:41:05,539 - root - INFO - Training Step: 46 Total episodes trained: 16724
2024-05-19 01:41:05,539 - root - INFO - policy_loss: 1.210792917157373
2024-05-19 01:41:05,540 - root - INFO - learner_stats:{'allreduce_latency': 0.0, 'grad_gnorm': 20.0480034899968, 'cur_kl_coeff': 0.0, 'cur_lr': 9.999999999999999e-06, 'total_loss': 1.3852985132125115, 'policy_loss': 1.210792917157373, 'vf_loss': 0.17453701313464873, 'vf_explained_var': 0.9512318565640399, 'kl': 0.0, 'entropy': 0.0031440757335193695, 'entropy_coeff': 0.01}
2024-05-19 01:41:05,540 - root - INFO - episode_reward_mean:6.573289824801651
2024-05-19 01:43:12,640 - root - INFO - Training Step: 47 Total episodes trained: 17088
2024-05-19 01:43:12,641 - root - INFO - policy_loss: 1.167067664104604
2024-05-19 01:43:12,641 - root - INFO - learner_stats:{'allreduce_latency': 0.0, 'grad_gnorm': 19.616726393340738, 'cur_kl_coeff': 0.0,'cur_lr': 9.999999999999999e-06, 'total_loss': 1.3264924516280492, 'policy_loss': 1.167067664104604, 'vf_loss': 0.15946670801889512, 'vf_explained_var': 0.9556506642731287, 'kl': 0.0, 'entropy': 0.004192324247091047, 'entropy_coeff': 0.01}
2024-05-19 01:43:12,641 - root - INFO - episode_reward_mean:6.71575325188109
When this happened, I noticed from top cmd (see below) that only the main program is running but not any of ray:RolloutWorker processes.
Tasks: 23 total, 2 running, 21 sleeping, 0 stopped, 0 zombie
%Cpu(s): 1.2 us, 1.3 sy, 0.0 ni, 97.5 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
MiB Mem : 491786.2 total, 348301.0 free, 22941.5 used, 120543.7 buff/cache
MiB Swap: 0.0 total, 0.0 free, 0.0 used. 465896.0 avail Mem
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
130313 jovyan 20 0 33.7g 2.6g 273808 R 106.3 0.5 223:31.68 python
130696 jovyan 20 0 3771024 104512 40572 S 6.6 0.0 11:38.86 python
130451 jovyan 20 0 7371780 252944 15712 S 1.3 0.1 2:12.27 gcs_server
130641 jovyan 20 0 41.2g 29076 15408 S 1.3 0.0 2:14.34 raylet
480 jovyan 35 15 29.0g 3.8g 263380 S 1.0 0.8 75:04.28 ray::RolloutWor
482 jovyan 35 15 29.0g 3.8g 261468 S 1.0 0.8 69:36.12 ray::RolloutWor
483 jovyan 35 15 29.0g 3.8g 262740 S 1.0 0.8 70:34.22 ray::RolloutWor
130700 jovyan 20 0 1678300 65400 27568 S 1.0 0.0 1:30.07 python
130734 jovyan 35 15 22.6g 71616 30664 S 1.0 0.0 1:28.32 ray::IDLE
130736 jovyan 35 15 22.6g 71688 30732 S 1.0 0.0 1:28.52 ray::IDLE
481 jovyan 35 15 29.0g 3.8g 264140 S 0.7 0.8 73:02.55 ray::RolloutWor
130566 jovyan 20 0 3913168 98212 40900 S 0.7 0.0 1:11.80 python
130738 jovyan 35 15 22.6g 71752 30772 S 0.7 0.0 1:27.87 ray::IDLE
130739 jovyan 35 15 22.6g 72056 31136 S 0.7 0.0 1:29.13 ray::IDLE
7 jovyan 20 0 571392 108648 18316 S 0.3 0.0 2:37.32 jupyter-noteboo
130565 jovyan 20 0 1906128 66564 28360 S 0.3 0.0 0:10.98 python
1 jovyan 20 0 3008 672 364 S 0.0 0.0 0:01.77 tini
6 jovyan 20 0 3256 2012 1556 S 0.0 0.0 0:00.00 sh
40 jovyan 20 0 9208 6076 3680 S 0.0 0.0 0:01.71 bash
414 jovyan 20 0 10328 5996 3568 S 0.0 0.0 0:01.56 bash
81351 jovyan 20 0 9500 4124 3292 R 0.0 0.0 0:00.15 top
130226 jovyan 20 0 3847208 164148 55764 S 0.0 0.0 0:04.25 python
130698 jovyan 20 0 1830012 65488 28684 S 0.0 0.0 0:03.05 python