Evaluation worker won't stop

I am on the last commit on master [ci/release] Add CODEOWNERs for Ray release test package (#24774)

I use APEX-DQN. I want to evaluate during an episode every 3 training steps. So i did this:

run: APEX
num_samples: 1
checkpoint_freq: 50
checkpoint_at_end: true
local_dir: "results"
stop:
  timesteps_total: 10_000_000
  training_iteration: 10000
config:
  metrics_num_episodes_for_smoothing: 1
  ignore_worker_failures: True
  num_workers: 1 
  num_envs_per_worker: 1
  num_gpus: 1
  gamma:
    grid_search:
      - 0.9

  learning_starts: 0 
  min_sample_timesteps_per_reporting: 1000 
  min_time_s_per_reporting: 0
  min_train_timesteps_per_reporting: 0


  framework: tf
  model:
    fcnet_hiddens:
      grid_search:
        - [ 64, 32 ]
    fcnet_activation: linear

  evaluation_num_workers: 1
  evaluation_interval: 3
  evaluation_parallel_to_training: False
  evaluation_duration: 1
  evaluation_duration_unit: "episodes"
  evaluation_config:
    explore: False

Note that an episode lasts for 1000 steps then stop.

The problem is, as soon as the second training iteration is done (ie 2 episodes, or 2000 steps), the evaluation worker starts (as expected) but won’t stop evaluating: it runs several episodes. Meanwhile, the training worker is stepping a bit but not at the same speed as the eval worker. I’ve tried with evaluation_parallel_to_training=True. It is the same problem. On the logs, you can see the TrainEnv_1 and EvalEnv_1 running. At some point EvalEnv_1 is flooding the logs and TrainEnv_1 iterates at a very slow pace.

/home/ncarrara/anaconda3/envs/myrill/bin/python3.8 /home/ncarrara/work/myrill/core/main/main_ray_tune.py config/final/main.yaml --override_config=config/final/overrides/override_precision7560.yaml --log_level=ERROR --override_main_seed=-1
[Level 51][misc] [seed_sever_seed_generator] main_seed is negative (-1), we are setting a random seed 3946479510
2022-05-17 16:07:22,262	WARNING deprecation.py:46 -- DeprecationWarning: `ray.rllib.execution.buffers` has been deprecated. Use `ray.rllib.utils.replay_buffers` instead. This will raise an error in the future!
2022-05-17 16:07:22,273	WARNING deprecation.py:46 -- DeprecationWarning: `ray.rllib.agents.dqn.dqn.DEFAULT_CONFIG` has been deprecated. Use `ray.rllib.agents.dqn.dqn.DQNConfig(...)` instead. This will raise an error in the future!
[Level 51][misc] [on_trial_start 0]
(raylet) 2022-05-17 16:07:22,888	INFO context.py:70 -- Exec'ing worker with command: exec /home/ncarrara/anaconda3/envs/myrill/bin/python3.8 /home/ncarrara/anaconda3/envs/myrill/lib/python3.8/site-packages/ray/workers/default_worker.py --node-ip-address=152.81.4.229 --node-manager-port=46613 --object-store-name=/tmp/ray/session_2022-05-17_16-07-19_157126_72966/sockets/plasma_store --raylet-name=/tmp/ray/session_2022-05-17_16-07-19_157126_72966/sockets/raylet --redis-address=None --storage=None --temp-dir=/tmp/ray --metrics-agent-port=61912 --logging-rotate-bytes=536870912 --logging-rotate-backup-count=5 --gcs-address=152.81.4.229:64538 --redis-password=5241590000000000 --startup-token=16 --runtime-env-hash=1787716506
(pid=73234) 
(pid=73756) 2022-05-17 16:07:24,788	WARNING deprecation.py:46 -- DeprecationWarning: `ray.rllib.agents.dqn.dqn.DEFAULT_CONFIG` has been deprecated. Use `ray.rllib.agents.dqn.dqn.DQNConfig(...)` instead. This will raise an error in the future!
(ApexTrainer pid=73756) 2022-05-17 16:07:25,274	INFO trainer.py:1727 -- Your framework setting is 'tf', meaning you are using static-graph mode. Set framework='tf2' to enable eager execution with tf2.x. You may also then want to set eager_tracing=True in order to reach similar execution speed as with static-graph mode.
(ApexTrainer pid=73756) 2022-05-17 16:07:25,277	WARNING deprecation.py:46 -- DeprecationWarning: `config['learning_starts']` has been deprecated. config['replay_buffer_config']['learning_starts'] This will raise an error in the future!
(ApexTrainer pid=73756) 2022-05-17 16:07:25,277	WARNING deprecation.py:46 -- DeprecationWarning: `config['multiagent']['replay_mode']` has been deprecated. config['replay_buffer_config']['replay_mode'] This will raise an error in the future!
(ApexTrainer pid=73756) 2022-05-17 16:07:25,277	INFO simple_q.py:291 -- In multi-agent mode, policies will be optimized sequentially by the multi-GPU optimizer. Consider setting `simple_optimizer=True` if this doesn't work for you.
(ApexTrainer pid=73756) 2022-05-17 16:07:25,277	INFO trainer.py:325 -- Current log_level is ERROR. For more information, set 'log_level': 'INFO' / 'DEBUG' or use the -v and -vv flags.
(raylet) 2022-05-17 16:07:25,751	INFO context.py:70 -- Exec'ing worker with command: exec /home/ncarrara/anaconda3/envs/myrill/bin/python3.8 /home/ncarrara/anaconda3/envs/myrill/lib/python3.8/site-packages/ray/workers/default_worker.py --node-ip-address=152.81.4.229 --node-manager-port=46613 --object-store-name=/tmp/ray/session_2022-05-17_16-07-19_157126_72966/sockets/plasma_store --raylet-name=/tmp/ray/session_2022-05-17_16-07-19_157126_72966/sockets/raylet --redis-address=None --storage=None --temp-dir=/tmp/ray --metrics-agent-port=61912 --logging-rotate-bytes=536870912 --logging-rotate-backup-count=5 --gcs-address=152.81.4.229:64538 --redis-password=5241590000000000 --startup-token=17 --runtime-env-hash=1787716506
(pid=73236) 
(RolloutWorker pid=73887) 2022-05-17 16:07:27,944	WARNING deprecation.py:46 -- DeprecationWarning: `ray.rllib.agents.dqn.dqn.DEFAULT_CONFIG` has been deprecated. Use `ray.rllib.agents.dqn.dqn.DQNConfig(...)` instead. This will raise an error in the future!
(RolloutWorker pid=73887) [Level 51][misc] [TrainEnv_1] env seed 3358978638
(RolloutWorker pid=73887) [Level 51][misc] [TrainEnv_1] reset 0
(RolloutWorker pid=73887) [Level 51][misc] [TrainEnv_1] clock: 2022-05-15T12:00:34.948Z -> 2022-05-15T12:14:06.078Z
(RolloutWorker pid=73887) [Level 51][misc] [TrainEnv_1] step 0
(RolloutWorker pid=73887) [Level 51][misc] [id_callback=140347653122704][on_sub_environment_created] TrainEnv_1
(ApexTrainer pid=73756) [on_trainer_init]
(raylet) 2022-05-17 16:07:39,237	INFO context.py:70 -- Exec'ing worker with command: exec /home/ncarrara/anaconda3/envs/myrill/bin/python3.8 /home/ncarrara/anaconda3/envs/myrill/lib/python3.8/site-packages/ray/workers/default_worker.py --node-ip-address=152.81.4.229 --node-manager-port=46613 --object-store-name=/tmp/ray/session_2022-05-17_16-07-19_157126_72966/sockets/plasma_store --raylet-name=/tmp/ray/session_2022-05-17_16-07-19_157126_72966/sockets/raylet --redis-address=None --storage=None --temp-dir=/tmp/ray --metrics-agent-port=61912 --logging-rotate-bytes=536870912 --logging-rotate-backup-count=5 --gcs-address=152.81.4.229:64538 --redis-password=5241590000000000 --startup-token=18 --runtime-env-hash=1787716506
(raylet) 2022-05-17 16:07:39,238	INFO context.py:70 -- Exec'ing worker with command: exec /home/ncarrara/anaconda3/envs/myrill/bin/python3.8 /home/ncarrara/anaconda3/envs/myrill/lib/python3.8/site-packages/ray/workers/default_worker.py --node-ip-address=152.81.4.229 --node-manager-port=46613 --object-store-name=/tmp/ray/session_2022-05-17_16-07-19_157126_72966/sockets/plasma_store --raylet-name=/tmp/ray/session_2022-05-17_16-07-19_157126_72966/sockets/raylet --redis-address=None --storage=None --temp-dir=/tmp/ray --metrics-agent-port=61912 --logging-rotate-bytes=536870912 --logging-rotate-backup-count=5 --gcs-address=152.81.4.229:64538 --redis-password=5241590000000000 --startup-token=20 --runtime-env-hash=1787716506
(raylet) 2022-05-17 16:07:39,241	INFO context.py:70 -- Exec'ing worker with command: exec /home/ncarrara/anaconda3/envs/myrill/bin/python3.8 /home/ncarrara/anaconda3/envs/myrill/lib/python3.8/site-packages/ray/workers/default_worker.py --node-ip-address=152.81.4.229 --node-manager-port=46613 --object-store-name=/tmp/ray/session_2022-05-17_16-07-19_157126_72966/sockets/plasma_store --raylet-name=/tmp/ray/session_2022-05-17_16-07-19_157126_72966/sockets/raylet --redis-address=None --storage=None --temp-dir=/tmp/ray --metrics-agent-port=61912 --logging-rotate-bytes=536870912 --logging-rotate-backup-count=5 --gcs-address=152.81.4.229:64538 --redis-password=5241590000000000 --startup-token=22 --runtime-env-hash=1787716506
(raylet) 2022-05-17 16:07:39,247	INFO context.py:70 -- Exec'ing worker with command: exec /home/ncarrara/anaconda3/envs/myrill/bin/python3.8 /home/ncarrara/anaconda3/envs/myrill/lib/python3.8/site-packages/ray/workers/default_worker.py --node-ip-address=152.81.4.229 --node-manager-port=46613 --object-store-name=/tmp/ray/session_2022-05-17_16-07-19_157126_72966/sockets/plasma_store --raylet-name=/tmp/ray/session_2022-05-17_16-07-19_157126_72966/sockets/raylet --redis-address=None --storage=None --temp-dir=/tmp/ray --metrics-agent-port=61912 --logging-rotate-bytes=536870912 --logging-rotate-backup-count=5 --gcs-address=152.81.4.229:64538 --redis-password=5241590000000000 --startup-token=19 --runtime-env-hash=1787716506
(raylet) 2022-05-17 16:07:39,248	INFO context.py:70 -- Exec'ing worker with command: exec /home/ncarrara/anaconda3/envs/myrill/bin/python3.8 /home/ncarrara/anaconda3/envs/myrill/lib/python3.8/site-packages/ray/workers/default_worker.py --node-ip-address=152.81.4.229 --node-manager-port=46613 --object-store-name=/tmp/ray/session_2022-05-17_16-07-19_157126_72966/sockets/plasma_store --raylet-name=/tmp/ray/session_2022-05-17_16-07-19_157126_72966/sockets/raylet --redis-address=None --storage=None --temp-dir=/tmp/ray --metrics-agent-port=61912 --logging-rotate-bytes=536870912 --logging-rotate-backup-count=5 --gcs-address=152.81.4.229:64538 --redis-password=5241590000000000 --startup-token=21 --runtime-env-hash=1787716506
(pid=73237) 
(pid=73231) 
(pid=73227) 
(pid=73225) 
(pid=73226) 
== Status ==
Current time: 2022-05-17 16:07:22 (running for 00:00:00.14)
Memory usage on this node: 10.5/125.0 GiB
Using FIFO scheduling algorithm.
Resources requested: 7.0/16 CPUs, 1.0/1 GPUs, 0.0/72.91 GiB heap, 0.0/35.24 GiB objects (0.0/1.0 accelerator_type:RTX)
Result logdir: /home/ncarrara/work/myrill/core/main/results/apex19
Number of trials: 1/1 (1 RUNNING)
+--------------+----------+--------------------+------------------------+------------------------+------------------------+------------------------+------------------------+------------------------+------------------------+------------------------+------------------------+------------------------+------------------------+---------+-----------------------+
| Trial name   | status   | loc                | env_config/nodes/...   | env_config/nodes/...   |   env_config/nodes/... |   env_config/nodes/... | env_config/nodes/...   | env_config/nodes/...   | env_config/nodes/...   |   env_config/nodes/... | env_config/nodes/...   | env_config/nodes/...   | env_config/nodes/...   |   gamma | model/fcnet_hiddens   |
|--------------+----------+--------------------+------------------------+------------------------+------------------------+------------------------+------------------------+------------------------+------------------------+------------------------+------------------------+------------------------+------------------------+---------+-----------------------|
| APEX         | RUNNING  | 152.81.4.229:73756 | BPSAction              | [0, 0.1, 0.25, _fcc0   |                      0 |                1000000 | True                   | False                  | []                     |                      1 | []                     | True                   | False                  |     0.9 | [64, 32]              |
+--------------+----------+--------------------+------------------------+------------------------+------------------------+------------------------+------------------------+------------------------+------------------------+------------------------+------------------------+------------------------+------------------------+---------+-----------------------+


(ApexTrainer pid=73756) 2022-05-17 16:07:41,375	INFO trainable.py:159 -- Trainable.setup took 16.101 seconds. If your trainable is slow to initialize, consider setting reuse_actors=True to reduce actor creation overheads.
(RolloutWorker pid=73887) [Level 51][misc] [TrainEnv_1] reset 1
(RolloutWorker pid=73887) [Level 51][misc] [TrainEnv_1] clock: 2022-05-11T19:38:44.491Z -> 2022-05-11T19:45:31.902Z
(RolloutWorker pid=73887) [ERROR][asyncable] [LocalWorld] you should find a way to close correctly your object, use flag, and threading.event systems 
(RolloutWorker pid=74270) 2022-05-17 16:07:41,880	WARNING deprecation.py:46 -- DeprecationWarning: `ray.rllib.agents.dqn.dqn.DEFAULT_CONFIG` has been deprecated. Use `ray.rllib.agents.dqn.dqn.DQNConfig(...)` instead. This will raise an error in the future!
(RolloutWorker pid=74270) [Level 51][misc] [EvalEnv_1] env seed 2216427457
(RolloutWorker pid=74270) [Level 51][misc] [EvalEnv_1] reset 0
(RolloutWorker pid=74270) [Level 51][misc] [EvalEnv_1] clock: 2022-05-12T05:01:53.102Z -> 2022-05-12T05:08:27.645Z
(RolloutWorker pid=74270) [Level 51][misc] [EvalEnv_1] step 0
(RolloutWorker pid=74270) [Level 51][misc] [id_callback=139902999837808][on_sub_environment_created] EvalEnv_1
(RolloutWorker pid=73887) [Level 51][misc] [on_episode_start]
(RolloutWorker pid=73887) [Level 51][misc] [TrainEnv_1] step 0
(MultiAgentPrioritizedReplayBuffer pid=74271) 2022-05-17 16:07:44,760	INFO replay_buffer.py:55 -- Estimated max memory usage for replay buffer is 0.0585 GB (500000.0 batches of size 1, 117 bytes each), available system memory is 134.243328 GB
(MultiAgentPrioritizedReplayBuffer pid=74271) 2022-05-17 16:07:44,760	WARNING multi_agent_prioritized_replay_buffer.py:188 -- Adding batches with column `weights` to this buffer while providing weights as a call argument to the add method results in the column being overwritten.
(RolloutWorker pid=73887) [Level 51][misc] [TrainEnv_1] step 200
(MultiAgentPrioritizedReplayBuffer pid=74272) 2022-05-17 16:07:46,434	INFO replay_buffer.py:55 -- Estimated max memory usage for replay buffer is 0.0585 GB (500000.0 batches of size 1, 117 bytes each), available system memory is 134.243328 GB
(MultiAgentPrioritizedReplayBuffer pid=74272) 2022-05-17 16:07:46,434	WARNING multi_agent_prioritized_replay_buffer.py:188 -- Adding batches with column `weights` to this buffer while providing weights as a call argument to the add method results in the column being overwritten.
(MultiAgentPrioritizedReplayBuffer pid=74276) 2022-05-17 16:07:47,987	INFO replay_buffer.py:55 -- Estimated max memory usage for replay buffer is 0.0585 GB (500000.0 batches of size 1, 117 bytes each), available system memory is 134.243328 GB
(MultiAgentPrioritizedReplayBuffer pid=74276) 2022-05-17 16:07:47,987	WARNING multi_agent_prioritized_replay_buffer.py:188 -- Adding batches with column `weights` to this buffer while providing weights as a call argument to the add method results in the column being overwritten.
(RolloutWorker pid=73887) [Level 51][misc] [TrainEnv_1] step 400
(MultiAgentPrioritizedReplayBuffer pid=74277) 2022-05-17 16:07:49,431	INFO replay_buffer.py:55 -- Estimated max memory usage for replay buffer is 0.0585 GB (500000.0 batches of size 1, 117 bytes each), available system memory is 134.243328 GB
(MultiAgentPrioritizedReplayBuffer pid=74277) 2022-05-17 16:07:49,431	WARNING multi_agent_prioritized_replay_buffer.py:188 -- Adding batches with column `weights` to this buffer while providing weights as a call argument to the add method results in the column being overwritten.
(RolloutWorker pid=73887) [Level 51][misc] [TrainEnv_1] step 600
(RolloutWorker pid=73887) [Level 51][misc] [TrainEnv_1] step 800
(RolloutWorker pid=73887) [Level 51][misc] [TrainEnv_1] reset 2
(RolloutWorker pid=73887) [Level 51][misc] [TrainEnv_1] clock: 2022-05-14T11:43:36.346Z -> 2022-05-14T12:00:25.576Z
(RolloutWorker pid=73887) [ERROR][asyncable] [LocalWorld] you should find a way to close correctly your object, use flag, and threading.event systems 
(RolloutWorker pid=73887) [Level 51][misc] [on_episode_start]
(RolloutWorker pid=73887) [Level 51][misc] [TrainEnv_1] step 0
(RolloutWorker pid=73887) [Level 51][misc] [TrainEnv_1] step 200
(RolloutWorker pid=73887) [Level 51][misc] [TrainEnv_1] step 400
(RolloutWorker pid=73887) [Level 51][misc] [TrainEnv_1] step 600
(RolloutWorker pid=73887) [Level 51][misc] [TrainEnv_1] step 800
(RolloutWorker pid=73887) [Level 51][misc] [TrainEnv_1] reset 3
(RolloutWorker pid=73887) [Level 51][misc] [TrainEnv_1] clock: 2022-05-14T14:17:23.018Z -> 2022-05-14T14:31:36.041Z
(RolloutWorker pid=73887) [ERROR][asyncable] [LocalWorld] you should find a way to close correctly your object, use flag, and threading.event systems 
(RolloutWorker pid=73887) [Level 51][misc] [on_episode_start]
(RolloutWorker pid=73887) [Level 51][misc] [TrainEnv_1] step 0
(RolloutWorker pid=74270) [Level 51][misc] [EvalEnv_1] reset 1
(RolloutWorker pid=74270) [Level 51][misc] [EvalEnv_1] clock: 2022-05-15T12:26:44.100Z -> 2022-05-15T12:40:47.363Z
(RolloutWorker pid=74270) [ERROR][asyncable] [LocalWorld] you should find a way to close correctly your object, use flag, and threading.event systems 
(RolloutWorker pid=74270) [Level 51][misc] [on_episode_start]
(RolloutWorker pid=74270) [Level 51][misc] [EvalEnv_1] step 0
(RolloutWorker pid=74270) [Level 51][misc] [EvalEnv_1] step 200
(RolloutWorker pid=74270) [Level 51][misc] [EvalEnv_1] step 400
(RolloutWorker pid=74270) [Level 51][misc] [EvalEnv_1] step 600
(RolloutWorker pid=74270) [Level 51][misc] [EvalEnv_1] step 800
(RolloutWorker pid=74270) [Level 51][misc] [EvalEnv_1] reset 2
(RolloutWorker pid=74270) [Level 51][misc] [EvalEnv_1] clock: 2022-05-11T18:42:03.464Z -> 2022-05-11T18:49:25.894Z
(RolloutWorker pid=74270) [ERROR][asyncable] [LocalWorld] you should find a way to close correctly your object, use flag, and threading.event systems 
(RolloutWorker pid=74270) [Level 51][misc] [on_episode_start]
(RolloutWorker pid=73887) [Level 51][misc] [TrainEnv_1] step 200
(RolloutWorker pid=74270) [Level 51][misc] [EvalEnv_1] step 0
(RolloutWorker pid=74270) [Level 51][misc] [EvalEnv_1] step 200
(RolloutWorker pid=74270) [Level 51][misc] [EvalEnv_1] step 400
(RolloutWorker pid=74270) [Level 51][misc] [EvalEnv_1] step 600
(RolloutWorker pid=74270) [Level 51][misc] [EvalEnv_1] step 800
(RolloutWorker pid=74270) [Level 51][misc] [EvalEnv_1] reset 3
(RolloutWorker pid=74270) [Level 51][misc] [EvalEnv_1] clock: 2022-05-13T03:58:24.705Z -> 2022-05-13T04:14:16.128Z
(RolloutWorker pid=74270) [ERROR][asyncable] [LocalWorld] you should find a way to close correctly your object, use flag, and threading.event systems 
(RolloutWorker pid=74270) [Level 51][misc] [on_episode_start]
(RolloutWorker pid=74270) [Level 51][misc] [EvalEnv_1] step 0
== Status ==
Current time: 2022-05-17 16:08:26 (running for 00:01:03.71)
Memory usage on this node: 14.7/125.0 GiB
Using FIFO scheduling algorithm.
Resources requested: 7.0/16 CPUs, 1.0/1 GPUs, 0.0/72.91 GiB heap, 0.0/35.24 GiB objects (0.0/1.0 accelerator_type:RTX)
Result logdir: /home/ncarrara/work/myrill/core/main/results/apex19
Number of trials: 1/1 (1 RUNNING)
+--------------+----------+--------------------+------------------------+------------------------+------------------------+------------------------+------------------------+------------------------+------------------------+------------------------+------------------------+------------------------+------------------------+---------+-----------------------+--------+------------------+------+----------+------------------------+
| Trial name   | status   | loc                | env_config/nodes/...   | env_config/nodes/...   |   env_config/nodes/... |   env_config/nodes/... | env_config/nodes/...   | env_config/nodes/...   | env_config/nodes/...   |   env_config/nodes/... | env_config/nodes/...   | env_config/nodes/...   | env_config/nodes/...   |   gamma | model/fcnet_hiddens   |   iter |   total time (s) |   ts |   reward |   custom_metrics/up... |
|--------------+----------+--------------------+------------------------+------------------------+------------------------+------------------------+------------------------+------------------------+------------------------+------------------------+------------------------+------------------------+------------------------+---------+-----------------------+--------+------------------+------+----------+------------------------|
| APEX         | RUNNING  | 152.81.4.229:73756 | BPSAction              | [0, 0.1, 0.25, _fcc0   |                      0 |                1000000 | True                   | False                  | []                     |                      1 | []                     | True                   | False                  |     0.9 | [64, 32]              |      2 |          24.4319 | 2000 | -6.67196 |             -0.0052229 |
+--------------+----------+--------------------+------------------------+------------------------+------------------------+------------------------+------------------------+------------------------+------------------------+------------------------+------------------------+------------------------+------------------------+---------+-----------------------+--------+------------------+------+----------+------------------------+


(RolloutWorker pid=74270) [Level 51][misc] [EvalEnv_1] step 200
(RolloutWorker pid=74270) [Level 51][misc] [EvalEnv_1] step 400
(RolloutWorker pid=74270) [Level 51][misc] [EvalEnv_1] step 600
(RolloutWorker pid=74270) [Level 51][misc] [EvalEnv_1] step 800
(RolloutWorker pid=74270) [Level 51][misc] [EvalEnv_1] reset 4
(RolloutWorker pid=74270) [Level 51][misc] [EvalEnv_1] clock: 2022-05-13T02:38:16.716Z -> 2022-05-13T02:48:00.773Z
(RolloutWorker pid=74270) [ERROR][asyncable] [LocalWorld] you should find a way to close correctly your object, use flag, and threading.event systems 
(RolloutWorker pid=74270) [Level 51][misc] [on_episode_start]
(RolloutWorker pid=74270) [Level 51][misc] [EvalEnv_1] step 0
(RolloutWorker pid=74270) [Level 51][misc] [EvalEnv_1] step 200
(RolloutWorker pid=74270) [Level 51][misc] [EvalEnv_1] step 400
(RolloutWorker pid=74270) [Level 51][misc] [EvalEnv_1] step 600
(RolloutWorker pid=74270) [Level 51][misc] [EvalEnv_1] step 800
(RolloutWorker pid=74270) [Level 51][misc] [EvalEnv_1] reset 5
(RolloutWorker pid=74270) [Level 51][misc] [EvalEnv_1] clock: 2022-05-11T20:57:16.138Z -> 2022-05-11T21:05:41.755Z
(RolloutWorker pid=74270) [ERROR][asyncable] [LocalWorld] you should find a way to close correctly your object, use flag, and threading.event systems 
(RolloutWorker pid=74270) [Level 51][misc] [on_episode_start]
(RolloutWorker pid=74270) [Level 51][misc] [EvalEnv_1] step 0
(RolloutWorker pid=74270) [Level 51][misc] [EvalEnv_1] step 200
(RolloutWorker pid=74270) [Level 51][misc] [EvalEnv_1] step 400
(RolloutWorker pid=74270) [Level 51][misc] [EvalEnv_1] step 600
(RolloutWorker pid=74270) [Level 51][misc] [EvalEnv_1] step 800
(RolloutWorker pid=74270) [Level 51][misc] [EvalEnv_1] reset 6
(RolloutWorker pid=74270) [Level 51][misc] [EvalEnv_1] clock: 2022-05-14T14:31:36.736Z -> 2022-05-14T14:46:42.419Z
(RolloutWorker pid=74270) [ERROR][asyncable] [LocalWorld] you should find a way to close correctly your object, use flag, and threading.event systems 
(RolloutWorker pid=74270) [Level 51][misc] [on_episode_start]
(RolloutWorker pid=73887) [Level 51][misc] [TrainEnv_1] step 400
(RolloutWorker pid=74270) [Level 51][misc] [EvalEnv_1] step 0
(RolloutWorker pid=74270) [Level 51][misc] [EvalEnv_1] step 200
(RolloutWorker pid=74270) [Level 51][misc] [EvalEnv_1] step 400
(RolloutWorker pid=74270) [Level 51][misc] [EvalEnv_1] step 600
(RolloutWorker pid=74270) [Level 51][misc] [EvalEnv_1] step 800
(RolloutWorker pid=74270) [Level 51][misc] [EvalEnv_1] reset 7
(RolloutWorker pid=74270) [Level 51][misc] [EvalEnv_1] clock: 2022-05-15T10:55:08.893Z -> 2022-05-15T11:09:30.729Z
(RolloutWorker pid=74270) [ERROR][asyncable] [LocalWorld] you should find a way to close correctly your object, use flag, and threading.event systems 
(RolloutWorker pid=74270) [Level 51][misc] [on_episode_start]
(RolloutWorker pid=74270) [Level 51][misc] [EvalEnv_1] step 0
(RolloutWorker pid=74270) [Level 51][misc] [EvalEnv_1] step 200
(RolloutWorker pid=74270) [Level 51][misc] [EvalEnv_1] step 400
(RolloutWorker pid=74270) [Level 51][misc] [EvalEnv_1] step 600
(RolloutWorker pid=74270) [Level 51][misc] [EvalEnv_1] step 800
(RolloutWorker pid=74270) [Level 51][misc] [EvalEnv_1] reset 8
(RolloutWorker pid=74270) [Level 51][misc] [EvalEnv_1] clock: 2022-05-12T22:32:24.169Z -> 2022-05-12T22:43:54.519Z
(RolloutWorker pid=74270) [ERROR][asyncable] [LocalWorld] you should find a way to close correctly your object, use flag, and threading.event systems 
(RolloutWorker pid=74270) [Level 51][misc] [on_episode_start]
(RolloutWorker pid=74270) [Level 51][misc] [EvalEnv_1] step 0
(RolloutWorker pid=74270) [Level 51][misc] [EvalEnv_1] step 200
(RolloutWorker pid=74270) [Level 51][misc] [EvalEnv_1] step 400
(RolloutWorker pid=74270) [Level 51][misc] [EvalEnv_1] step 600
(RolloutWorker pid=74270) [Level 51][misc] [EvalEnv_1] step 800
(RolloutWorker pid=74270) [Level 51][misc] [EvalEnv_1] reset 9
(RolloutWorker pid=74270) [Level 51][misc] [EvalEnv_1] clock: 2022-05-14T06:49:17.779Z -> 2022-05-14T07:13:54.980Z
(RolloutWorker pid=74270) [ERROR][asyncable] [LocalWorld] you should find a way to close correctly your object, use flag, and threading.event systems 
(RolloutWorker pid=74270) [Level 51][misc] [on_episode_start]
(RolloutWorker pid=74270) [Level 51][misc] [EvalEnv_1] step 0

Process finished with exit code 137 (interrupted by signal 9: SIGKILL)

Edit: Btw if I change evaluation_interval to 1 and local_mode=True. The training env will be reset a few times but won’t even step:

[Level 51][misc] [seed_sever_seed_generator] main_seed is negative (-1), we are setting a random seed 1691414407
2022-05-17 17:48:18,870	WARNING deprecation.py:46 -- DeprecationWarning: `ray.rllib.execution.buffers` has been deprecated. Use `ray.rllib.utils.replay_buffers` instead. This will raise an error in the future!
2022-05-17 17:48:18,887	WARNING deprecation.py:46 -- DeprecationWarning: `ray.rllib.agents.dqn.dqn.DEFAULT_CONFIG` has been deprecated. Use `ray.rllib.agents.dqn.dqn.DQNConfig(...)` instead. This will raise an error in the future!
2022-05-17 17:48:19,039	WARNING worker.py:525 -- `ray.get_gpu_ids()` will always return the empty list when called from the driver. This is because Ray does not manage GPU allocations to the driver process.
:task_name:bundle_reservation_check_func
:task_name:bundle_reservation_check_func
:actor_name:ApexTrainer
:actor_name:ApexTrainer
2022-05-17 17:48:19,173	INFO trainer.py:1727 -- Your framework setting is 'tf', meaning you are using static-graph mode. Set framework='tf2' to enable eager execution with tf2.x. You may also then want to set eager_tracing=True in order to reach similar execution speed as with static-graph mode.
2022-05-17 17:48:19,174	WARNING deprecation.py:46 -- DeprecationWarning: `config['learning_starts']` has been deprecated. config['replay_buffer_config']['learning_starts'] This will raise an error in the future!
2022-05-17 17:48:19,174	WARNING deprecation.py:46 -- DeprecationWarning: `config['multiagent']['replay_mode']` has been deprecated. config['replay_buffer_config']['replay_mode'] This will raise an error in the future!
2022-05-17 17:48:19,174	INFO simple_q.py:291 -- In multi-agent mode, policies will be optimized sequentially by the multi-GPU optimizer. Consider setting `simple_optimizer=True` if this doesn't work for you.
2022-05-17 17:48:19,174	INFO trainer.py:325 -- Current log_level is ERROR. For more information, set 'log_level': 'INFO' / 'DEBUG' or use the -v and -vv flags.
:actor_name:RolloutWorker
:actor_name:RolloutWorker
[Level 51][misc] Worker 1 is creating the env TrainEnv_1 
[Level 51][misc] [TrainEnv_1] env seed 3480895465
[Level 51][misc] [id_callback=140628712881072][on_sub_environment_created] TrainEnv_1
:actor_name:RolloutWorker
:actor_name:RolloutWorker
[Level 51][misc] Worker 1 is creating the env EvalEnv_1 
[Level 51][misc] [EvalEnv_1] env seed 1264697858
[Level 51][misc] [id_callback=140590319670320][on_sub_environment_created] EvalEnv_1
[Level 51][misc] [on_trainer_init]
:actor_name:MultiAgentPrioritizedReplayBuffer
:actor_name:MultiAgentPrioritizedReplayBuffer
:actor_name:MultiAgentPrioritizedReplayBuffer
:actor_name:MultiAgentPrioritizedReplayBuffer
:actor_name:MultiAgentPrioritizedReplayBuffer
:actor_name:MultiAgentPrioritizedReplayBuffer
:actor_name:MultiAgentPrioritizedReplayBuffer
:actor_name:MultiAgentPrioritizedReplayBuffer
[Level 51][misc] [TrainEnv_1] reset 0
[Level 51][misc] [TrainEnv_1] clock: 2022-05-12T13:51:58.803Z -> 2022-05-12T14:00:28.448Z
[Level 51][misc] [on_episode_start]
[Level 51][misc] [TrainEnv_1] step 0
[Level 51][misc] [EvalEnv_1] reset 0
[Level 51][misc] [EvalEnv_1] clock: 2022-05-12T12:40:29.438Z -> 2022-05-12T12:48:45.192Z
[Level 51][misc] [on_episode_start]
[Level 51][misc] [EvalEnv_1] step 0
[Level 51][misc] [EvalEnv_1] step 200
[Level 51][misc] [EvalEnv_1] step 400
[Level 51][misc] [EvalEnv_1] step 600
[Level 51][misc] [EvalEnv_1] step 800
[Level 51][misc] [EvalEnv_1] reset 1
[Level 51][misc] [EvalEnv_1] clock: 2022-05-14T11:00:32.625Z -> 2022-05-14T11:14:48.559Z
[ERROR][asyncable] [LocalWorld] you should find a way to close correctly your object, use flag, and threading.event systems 
[Level 51][misc] [on_episode_start]
[Level 51][misc] [EvalEnv_1] step 0
[Level 51][misc] [EvalEnv_1] step 200
[Level 51][misc] [EvalEnv_1] step 400
[Level 51][misc] [EvalEnv_1] step 600
[Level 51][misc] [EvalEnv_1] step 800
[Level 51][misc] [EvalEnv_1] reset 2
[Level 51][misc] [EvalEnv_1] clock: 2022-05-13T11:33:54.583Z -> 2022-05-13T11:49:05.886Z
[ERROR][asyncable] [LocalWorld] you should find a way to close correctly your object, use flag, and threading.event systems 
[Level 51][misc] [on_episode_start]
[Level 51][misc] [EvalEnv_1] step 0
[Level 51][misc] [EvalEnv_1] step 200
2022-05-17 17:48:59,462	WARNING tune.py:687 -- Stop signal received (e.g. via SIGINT/Ctrl+C), ending Ray Tune run. This will try to checkpoint the experiment state one last time. Press CTRL+C (or send SIGINT/SIGKILL/SIGTERM) to skip. 
2022-05-17 17:48:59,479	WARNING tune.py:687 -- Stop signal received (e.g. via SIGINT/Ctrl+C), ending Ray Tune run. This will try to checkpoint the experiment state one last time. Press CTRL+C (or send SIGINT/SIGKILL/SIGTERM) to skip. 

Process finished with exit code 137 (interrupted by signal 9: SIGKILL)

It was a rollout issue. The rollout_fragment_length was not high enough to cover a full episode. I don’t know why this behavior is expected tho

Hey @Nicolas_Carrara , thanks for posting this. It doesn’t seem normal that the evaluation worker won’t stop. With evaluation_parallel_to_training=False (your original setting), evaluation should happen (strictly) every 3 train runs (which is 1000 sample steps according to your min_sample_timesteps_per_reporting setting) and last for exactly one episode.

Let me take a quick look and get back to you …

1 Like

Hi Sven, did you find out what is the problem?

I face the same issue with DQN. My episode size is 1000. I would like a target_network_update_freq of every 10 steps. For that, I need to set rollout_fragment_lengh to 10 (otherwise It won’t update before the rollout ends). Since my rollout_fragment_length is lower than my episode size, I get the bug where the eval worker has priority over the training worker.

If I put evaluation_parallel_to_training, training won’t even start; only the evaluation worker runs