PPO agent training hang

Hey, folks,
I was training a PPO agent with a customized environment. I set up the training so that it is supposed to stop when 20k episodes are completed. However, I found that the process was always hanging after certain steps and will never be able to exit. Any ideas why and how to fix this? Thanks!

Here is my config:

ppo_config = ( 
    PPOConfig()
    .framework("torch")
    .rollouts(create_env_on_local_worker=True, num_rollout_workers=4)
    .debugging(seed=0, log_level="WARN")
    .training(model={"fcnet_hiddens" : [16, 16], "custom_model": "custom_ppo_model"},
              #train_batch_size = 11*num_episodes_per_batch, 
              #grad_clip = 10,
              #clip_param = 0.1,
              #vf_clip_param = 2,
              vf_loss_coeff = 1.0,
              kl_coeff = 0.0,
              entropy_coeff = 0.01, 
              lr=1e-5, 
              lambda_=0.85,
              gamma=0.99)
    .evaluation(
        evaluation_interval=1, 
        evaluation_config=PPOConfig.overrides(explore=True),    
    )
    .environment(env=NCO_ENV,env_config=env_config)
    .experimental(
        # _enable_new_api_stack=True,
        _disable_preprocessor_api=True,
    )
   #   .resources(num_learner_workers=2, num_cpus_per_worker=2)
    # .rl_module(rl_module_spec=rlm_spec)    
)

Here are some metrics I logged during the training.

2024-05-19 01:33:01,146 - root - INFO - Training Step: 42 Total episodes trained: 15272
2024-05-19 01:33:01,146 - root - INFO - policy_loss: 1.2008193078180474
2024-05-19 01:33:01,147 - root - INFO - learner_stats:{'allreduce_latency': 0.0, 'grad_gnorm': 20.37426555695072, 'cur_kl_coeff': 0.0, 'cur_lr': 9.999999999999999e-06, 'total_loss': 1.4067846547692053, 'policy_loss': 1.2008193078180474, 'vf_loss': 0.20599679135346927, 'vf_explained_var': 0.9453324226922887, 'kl': 0.0, 'entropy': 0.003144154228991078, 'entropy_coeff': 0.01}
2024-05-19 01:33:01,147 - root - INFO - episode_reward_mean:6.626979846071101
2024-05-19 01:35:00,439 - root - INFO - Training Step: 43 Total episodes trained: 15636
2024-05-19 01:35:00,441 - root - INFO - policy_loss: 1.1391864486279026
2024-05-19 01:35:00,441 - root - INFO - learner_stats:{'allreduce_latency': 0.0, 'grad_gnorm': 19.801141237443492, 'cur_kl_coeff': 0.0,'cur_lr': 9.999999999999999e-06, 'total_loss': 1.3345247489029681, 'policy_loss': 1.1391864486279026, 'vf_loss': 0.1953549200488675, 'vf_explained_var': 0.9470342062493806, 'kl': 0.0, 'entropy': 0.0016611376917490396, 'entropy_coeff': 0.01}
2024-05-19 01:35:00,441 - root - INFO - episode_reward_mean:6.673998273844872
2024-05-19 01:37:00,547 - root - INFO - Training Step: 44 Total episodes trained: 16000
2024-05-19 01:37:00,549 - root - INFO - policy_loss: 1.1743851169463126
2024-05-19 01:37:00,549 - root - INFO - learner_stats:{'allreduce_latency': 0.0, 'grad_gnorm': 20.133241493471207, 'cur_kl_coeff': 0.0,'cur_lr': 9.999999999999999e-06, 'total_loss': 1.3602027489293007, 'policy_loss': 1.1743851169463126, 'vf_loss': 0.18581761958137635, 'vf_explained_var': 0.9493927434567482, 'kl': 0.0, 'entropy': 0.0, 'entropy_coeff': 0.01}
2024-05-19 01:37:00,549 - root - INFO - episode_reward_mean:6.634718246296934
2024-05-19 01:39:01,843 - root - INFO - Training Step: 45 Total episodes trained: 16360
2024-05-19 01:39:01,844 - root - INFO - policy_loss: 1.2602679940962023
2024-05-19 01:39:01,844 - root - INFO - learner_stats:{'allreduce_latency': 0.0, 'grad_gnorm': 21.215875837879796, 'cur_kl_coeff': 0.0,'cur_lr': 9.999999999999999e-06, 'total_loss': 1.4378114342048605, 'policy_loss': 1.2602679940962023, 'vf_loss': 0.1775434453480987, 'vf_explained_var': 0.9508468215183545, 'kl': 0.0, 'entropy': 0.0, 'entropy_coeff': 0.01}
2024-05-19 01:39:01,844 - root - INFO - episode_reward_mean:6.6528128772941875
2024-05-19 01:41:05,539 - root - INFO - Training Step: 46 Total episodes trained: 16724
2024-05-19 01:41:05,539 - root - INFO - policy_loss: 1.210792917157373
2024-05-19 01:41:05,540 - root - INFO - learner_stats:{'allreduce_latency': 0.0, 'grad_gnorm': 20.0480034899968, 'cur_kl_coeff': 0.0, 'cur_lr': 9.999999999999999e-06, 'total_loss': 1.3852985132125115, 'policy_loss': 1.210792917157373, 'vf_loss': 0.17453701313464873, 'vf_explained_var': 0.9512318565640399, 'kl': 0.0, 'entropy': 0.0031440757335193695, 'entropy_coeff': 0.01}
2024-05-19 01:41:05,540 - root - INFO - episode_reward_mean:6.573289824801651
2024-05-19 01:43:12,640 - root - INFO - Training Step: 47 Total episodes trained: 17088
2024-05-19 01:43:12,641 - root - INFO - policy_loss: 1.167067664104604
2024-05-19 01:43:12,641 - root - INFO - learner_stats:{'allreduce_latency': 0.0, 'grad_gnorm': 19.616726393340738, 'cur_kl_coeff': 0.0,'cur_lr': 9.999999999999999e-06, 'total_loss': 1.3264924516280492, 'policy_loss': 1.167067664104604, 'vf_loss': 0.15946670801889512, 'vf_explained_var': 0.9556506642731287, 'kl': 0.0, 'entropy': 0.004192324247091047, 'entropy_coeff': 0.01}
2024-05-19 01:43:12,641 - root - INFO - episode_reward_mean:6.71575325188109

When this happened, I noticed from top cmd (see below) that only the main program is running but not any of ray:RolloutWorker processes.

Tasks:  23 total,   2 running,  21 sleeping,   0 stopped,   0 zombie
%Cpu(s):  1.2 us,  1.3 sy,  0.0 ni, 97.5 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
MiB Mem : 491786.2 total, 348301.0 free,  22941.5 used, 120543.7 buff/cache
MiB Swap:      0.0 total,      0.0 free,      0.0 used. 465896.0 avail Mem

   PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND
130313 jovyan    20   0   33.7g   2.6g 273808 R 106.3   0.5 223:31.68 python
130696 jovyan    20   0 3771024 104512  40572 S   6.6   0.0  11:38.86 python
130451 jovyan    20   0 7371780 252944  15712 S   1.3   0.1   2:12.27 gcs_server
130641 jovyan    20   0   41.2g  29076  15408 S   1.3   0.0   2:14.34 raylet
   480 jovyan    35  15   29.0g   3.8g 263380 S   1.0   0.8  75:04.28 ray::RolloutWor
   482 jovyan    35  15   29.0g   3.8g 261468 S   1.0   0.8  69:36.12 ray::RolloutWor
   483 jovyan    35  15   29.0g   3.8g 262740 S   1.0   0.8  70:34.22 ray::RolloutWor
130700 jovyan    20   0 1678300  65400  27568 S   1.0   0.0   1:30.07 python
130734 jovyan    35  15   22.6g  71616  30664 S   1.0   0.0   1:28.32 ray::IDLE
130736 jovyan    35  15   22.6g  71688  30732 S   1.0   0.0   1:28.52 ray::IDLE
   481 jovyan    35  15   29.0g   3.8g 264140 S   0.7   0.8  73:02.55 ray::RolloutWor
130566 jovyan    20   0 3913168  98212  40900 S   0.7   0.0   1:11.80 python
130738 jovyan    35  15   22.6g  71752  30772 S   0.7   0.0   1:27.87 ray::IDLE
130739 jovyan    35  15   22.6g  72056  31136 S   0.7   0.0   1:29.13 ray::IDLE
     7 jovyan    20   0  571392 108648  18316 S   0.3   0.0   2:37.32 jupyter-noteboo
130565 jovyan    20   0 1906128  66564  28360 S   0.3   0.0   0:10.98 python
     1 jovyan    20   0    3008    672    364 S   0.0   0.0   0:01.77 tini
     6 jovyan    20   0    3256   2012   1556 S   0.0   0.0   0:00.00 sh
    40 jovyan    20   0    9208   6076   3680 S   0.0   0.0   0:01.71 bash
   414 jovyan    20   0   10328   5996   3568 S   0.0   0.0   0:01.56 bash
 81351 jovyan    20   0    9500   4124   3292 R   0.0   0.0   0:00.15 top
130226 jovyan    20   0 3847208 164148  55764 S   0.0   0.0   0:04.25 python
130698 jovyan    20   0 1830012  65488  28684 S   0.0   0.0   0:03.05 python