I'm having a good run in my custom environment, but sometimes I'm not collecting data into episodes

I’m having a good run in my custom environment, but sometimes I’m not collecting data into episodes, I don’t know why?The worker environment is running normally.
And the weird thing is, sometimes the data is collected, sometimes it’s not.There appears to be a data loss that has not been collected into episodes

Result for DDPPO_test_env_9e629_00000:
agent_timesteps_total: 0
counters:
num_agent_steps_sampled: 0
num_agent_steps_trained: 0
num_env_steps_sampled: 0
num_env_steps_trained: 0
custom_metrics: {}
date: 2023-02-24_09-54-26
done: false
episode_len_mean: .nan
episode_media: {}
episode_reward_max: .nan
episode_reward_mean: .nan
episode_reward_min: .nan
episodes_this_iter: 0
episodes_total: 0
experiment_id: 4754a76a9fc549a2a965e1a41b205d92
hostname: georgezhang
info:
learner: {}
num_agent_steps_sampled: 0
num_agent_steps_trained: 0
num_env_steps_sampled: 0
num_env_steps_trained: 0
iterations_since_restore: 4
node_ip: 10.19.196.43
num_agent_steps_sampled: 0
num_agent_steps_trained: 0
num_env_steps_sampled: 0
num_env_steps_sampled_this_iter: 0
num_env_steps_trained: 0
num_env_steps_trained_this_iter: 0
num_faulty_episodes: 0
num_healthy_workers: 4
num_recreated_workers: 0
num_steps_trained_this_iter: 0
perf:
cpu_util_percent: 8.98470588235294
ram_util_percent: 39.48705882352941
pid: 54432
policy_reward_max: {}
policy_reward_mean: {}
policy_reward_min: {}
sampler_perf: {}
sampler_results:
custom_metrics: {}
episode_len_mean: .nan
episode_media: {}
episode_reward_max: .nan
episode_reward_mean: .nan
episode_reward_min: .nan
episodes_this_iter: 0
hist_stats:
episode_lengths:
episode_reward:
num_faulty_episodes: 0
policy_reward_max: {}
policy_reward_mean: {}
policy_reward_min: {}
sampler_perf: {}
time_since_restore: 240.15831351280212
time_this_iter_s: 60.04060769081116
time_total_s: 240.15831351280212
timers:
training_iteration_time_ms: 30.848
timestamp: 1677232466
timesteps_since_restore: 0
timesteps_total: 0
training_iteration: 4
trial_id: 9e629_00000
warmup_time: 13.909451484680176
== Status ==
Current time: 2023-02-24 09:54:31 (running for 00:04:24.81)
Memory usage on this node: 24.8/62.5 GiB
Using FIFO scheduling algorithm.
Resources requested: 21.0/80 CPUs, 1.6/2 GPUs, 0.0/80.5 GiB heap, 0.0/37.13 GiB objects (0.0/2.0 accelerator_type:RTX)
Result logdir: /rl-decision/ray_results/test_sumo_reward
Number of trials: 1/1 (1 RUNNING)
±---------------------------±---------±-------------------±-------±-----------------±-----±---------±-----------------------±---------------------±---------------------+
| Trial name | status | loc | iter | total time (s) | ts | reward | num_recreated_wor… | episode_reward_max | episode_reward_min |
|----------------------------±---------±-------------------±-------±-----------------±-----±---------±-----------------------±---------------------±---------------------|
| DDPPO_test_env_9e629_00000 | RUNNING | 10.19.196.43:54432 | 4 | 240.158 | 0 | nan | 0 | nan | nan |
±---------------------------±---------±-------------------±-------±-----------------±-----±---------±-----------------------±---------------------±---------------------+

Hi @zhangzhang , Please post a reproduction script!