I am on the last commit on master [ci/release] Add CODEOWNERs for Ray release test package (#24774)
I use APEX-DQN. I want to evaluate during an episode every 3 training steps. So i did this:
run: APEX
num_samples: 1
checkpoint_freq: 50
checkpoint_at_end: true
local_dir: "results"
stop:
timesteps_total: 10_000_000
training_iteration: 10000
config:
metrics_num_episodes_for_smoothing: 1
ignore_worker_failures: True
num_workers: 1
num_envs_per_worker: 1
num_gpus: 1
gamma:
grid_search:
- 0.9
learning_starts: 0
min_sample_timesteps_per_reporting: 1000
min_time_s_per_reporting: 0
min_train_timesteps_per_reporting: 0
framework: tf
model:
fcnet_hiddens:
grid_search:
- [ 64, 32 ]
fcnet_activation: linear
evaluation_num_workers: 1
evaluation_interval: 3
evaluation_parallel_to_training: False
evaluation_duration: 1
evaluation_duration_unit: "episodes"
evaluation_config:
explore: False
Note that an episode lasts for 1000 steps then stop.
The problem is, as soon as the second training iteration is done (ie 2 episodes, or 2000 steps), the evaluation worker starts (as expected) but won’t stop evaluating: it runs several episodes. Meanwhile, the training worker is stepping a bit but not at the same speed as the eval worker. I’ve tried with evaluation_parallel_to_training=True
. It is the same problem. On the logs, you can see the TrainEnv_1
and EvalEnv_1
running. At some point EvalEnv_1
is flooding the logs and TrainEnv_1
iterates at a very slow pace.
/home/ncarrara/anaconda3/envs/myrill/bin/python3.8 /home/ncarrara/work/myrill/core/main/main_ray_tune.py config/final/main.yaml --override_config=config/final/overrides/override_precision7560.yaml --log_level=ERROR --override_main_seed=-1
[Level 51][misc] [seed_sever_seed_generator] main_seed is negative (-1), we are setting a random seed 3946479510
2022-05-17 16:07:22,262 WARNING deprecation.py:46 -- DeprecationWarning: `ray.rllib.execution.buffers` has been deprecated. Use `ray.rllib.utils.replay_buffers` instead. This will raise an error in the future!
2022-05-17 16:07:22,273 WARNING deprecation.py:46 -- DeprecationWarning: `ray.rllib.agents.dqn.dqn.DEFAULT_CONFIG` has been deprecated. Use `ray.rllib.agents.dqn.dqn.DQNConfig(...)` instead. This will raise an error in the future!
[Level 51][misc] [on_trial_start 0]
(raylet) 2022-05-17 16:07:22,888 INFO context.py:70 -- Exec'ing worker with command: exec /home/ncarrara/anaconda3/envs/myrill/bin/python3.8 /home/ncarrara/anaconda3/envs/myrill/lib/python3.8/site-packages/ray/workers/default_worker.py --node-ip-address=152.81.4.229 --node-manager-port=46613 --object-store-name=/tmp/ray/session_2022-05-17_16-07-19_157126_72966/sockets/plasma_store --raylet-name=/tmp/ray/session_2022-05-17_16-07-19_157126_72966/sockets/raylet --redis-address=None --storage=None --temp-dir=/tmp/ray --metrics-agent-port=61912 --logging-rotate-bytes=536870912 --logging-rotate-backup-count=5 --gcs-address=152.81.4.229:64538 --redis-password=5241590000000000 --startup-token=16 --runtime-env-hash=1787716506
(pid=73234)
(pid=73756) 2022-05-17 16:07:24,788 WARNING deprecation.py:46 -- DeprecationWarning: `ray.rllib.agents.dqn.dqn.DEFAULT_CONFIG` has been deprecated. Use `ray.rllib.agents.dqn.dqn.DQNConfig(...)` instead. This will raise an error in the future!
(ApexTrainer pid=73756) 2022-05-17 16:07:25,274 INFO trainer.py:1727 -- Your framework setting is 'tf', meaning you are using static-graph mode. Set framework='tf2' to enable eager execution with tf2.x. You may also then want to set eager_tracing=True in order to reach similar execution speed as with static-graph mode.
(ApexTrainer pid=73756) 2022-05-17 16:07:25,277 WARNING deprecation.py:46 -- DeprecationWarning: `config['learning_starts']` has been deprecated. config['replay_buffer_config']['learning_starts'] This will raise an error in the future!
(ApexTrainer pid=73756) 2022-05-17 16:07:25,277 WARNING deprecation.py:46 -- DeprecationWarning: `config['multiagent']['replay_mode']` has been deprecated. config['replay_buffer_config']['replay_mode'] This will raise an error in the future!
(ApexTrainer pid=73756) 2022-05-17 16:07:25,277 INFO simple_q.py:291 -- In multi-agent mode, policies will be optimized sequentially by the multi-GPU optimizer. Consider setting `simple_optimizer=True` if this doesn't work for you.
(ApexTrainer pid=73756) 2022-05-17 16:07:25,277 INFO trainer.py:325 -- Current log_level is ERROR. For more information, set 'log_level': 'INFO' / 'DEBUG' or use the -v and -vv flags.
(raylet) 2022-05-17 16:07:25,751 INFO context.py:70 -- Exec'ing worker with command: exec /home/ncarrara/anaconda3/envs/myrill/bin/python3.8 /home/ncarrara/anaconda3/envs/myrill/lib/python3.8/site-packages/ray/workers/default_worker.py --node-ip-address=152.81.4.229 --node-manager-port=46613 --object-store-name=/tmp/ray/session_2022-05-17_16-07-19_157126_72966/sockets/plasma_store --raylet-name=/tmp/ray/session_2022-05-17_16-07-19_157126_72966/sockets/raylet --redis-address=None --storage=None --temp-dir=/tmp/ray --metrics-agent-port=61912 --logging-rotate-bytes=536870912 --logging-rotate-backup-count=5 --gcs-address=152.81.4.229:64538 --redis-password=5241590000000000 --startup-token=17 --runtime-env-hash=1787716506
(pid=73236)
(RolloutWorker pid=73887) 2022-05-17 16:07:27,944 WARNING deprecation.py:46 -- DeprecationWarning: `ray.rllib.agents.dqn.dqn.DEFAULT_CONFIG` has been deprecated. Use `ray.rllib.agents.dqn.dqn.DQNConfig(...)` instead. This will raise an error in the future!
(RolloutWorker pid=73887) [Level 51][misc] [TrainEnv_1] env seed 3358978638
(RolloutWorker pid=73887) [Level 51][misc] [TrainEnv_1] reset 0
(RolloutWorker pid=73887) [Level 51][misc] [TrainEnv_1] clock: 2022-05-15T12:00:34.948Z -> 2022-05-15T12:14:06.078Z
(RolloutWorker pid=73887) [Level 51][misc] [TrainEnv_1] step 0
(RolloutWorker pid=73887) [Level 51][misc] [id_callback=140347653122704][on_sub_environment_created] TrainEnv_1
(ApexTrainer pid=73756) [on_trainer_init]
(raylet) 2022-05-17 16:07:39,237 INFO context.py:70 -- Exec'ing worker with command: exec /home/ncarrara/anaconda3/envs/myrill/bin/python3.8 /home/ncarrara/anaconda3/envs/myrill/lib/python3.8/site-packages/ray/workers/default_worker.py --node-ip-address=152.81.4.229 --node-manager-port=46613 --object-store-name=/tmp/ray/session_2022-05-17_16-07-19_157126_72966/sockets/plasma_store --raylet-name=/tmp/ray/session_2022-05-17_16-07-19_157126_72966/sockets/raylet --redis-address=None --storage=None --temp-dir=/tmp/ray --metrics-agent-port=61912 --logging-rotate-bytes=536870912 --logging-rotate-backup-count=5 --gcs-address=152.81.4.229:64538 --redis-password=5241590000000000 --startup-token=18 --runtime-env-hash=1787716506
(raylet) 2022-05-17 16:07:39,238 INFO context.py:70 -- Exec'ing worker with command: exec /home/ncarrara/anaconda3/envs/myrill/bin/python3.8 /home/ncarrara/anaconda3/envs/myrill/lib/python3.8/site-packages/ray/workers/default_worker.py --node-ip-address=152.81.4.229 --node-manager-port=46613 --object-store-name=/tmp/ray/session_2022-05-17_16-07-19_157126_72966/sockets/plasma_store --raylet-name=/tmp/ray/session_2022-05-17_16-07-19_157126_72966/sockets/raylet --redis-address=None --storage=None --temp-dir=/tmp/ray --metrics-agent-port=61912 --logging-rotate-bytes=536870912 --logging-rotate-backup-count=5 --gcs-address=152.81.4.229:64538 --redis-password=5241590000000000 --startup-token=20 --runtime-env-hash=1787716506
(raylet) 2022-05-17 16:07:39,241 INFO context.py:70 -- Exec'ing worker with command: exec /home/ncarrara/anaconda3/envs/myrill/bin/python3.8 /home/ncarrara/anaconda3/envs/myrill/lib/python3.8/site-packages/ray/workers/default_worker.py --node-ip-address=152.81.4.229 --node-manager-port=46613 --object-store-name=/tmp/ray/session_2022-05-17_16-07-19_157126_72966/sockets/plasma_store --raylet-name=/tmp/ray/session_2022-05-17_16-07-19_157126_72966/sockets/raylet --redis-address=None --storage=None --temp-dir=/tmp/ray --metrics-agent-port=61912 --logging-rotate-bytes=536870912 --logging-rotate-backup-count=5 --gcs-address=152.81.4.229:64538 --redis-password=5241590000000000 --startup-token=22 --runtime-env-hash=1787716506
(raylet) 2022-05-17 16:07:39,247 INFO context.py:70 -- Exec'ing worker with command: exec /home/ncarrara/anaconda3/envs/myrill/bin/python3.8 /home/ncarrara/anaconda3/envs/myrill/lib/python3.8/site-packages/ray/workers/default_worker.py --node-ip-address=152.81.4.229 --node-manager-port=46613 --object-store-name=/tmp/ray/session_2022-05-17_16-07-19_157126_72966/sockets/plasma_store --raylet-name=/tmp/ray/session_2022-05-17_16-07-19_157126_72966/sockets/raylet --redis-address=None --storage=None --temp-dir=/tmp/ray --metrics-agent-port=61912 --logging-rotate-bytes=536870912 --logging-rotate-backup-count=5 --gcs-address=152.81.4.229:64538 --redis-password=5241590000000000 --startup-token=19 --runtime-env-hash=1787716506
(raylet) 2022-05-17 16:07:39,248 INFO context.py:70 -- Exec'ing worker with command: exec /home/ncarrara/anaconda3/envs/myrill/bin/python3.8 /home/ncarrara/anaconda3/envs/myrill/lib/python3.8/site-packages/ray/workers/default_worker.py --node-ip-address=152.81.4.229 --node-manager-port=46613 --object-store-name=/tmp/ray/session_2022-05-17_16-07-19_157126_72966/sockets/plasma_store --raylet-name=/tmp/ray/session_2022-05-17_16-07-19_157126_72966/sockets/raylet --redis-address=None --storage=None --temp-dir=/tmp/ray --metrics-agent-port=61912 --logging-rotate-bytes=536870912 --logging-rotate-backup-count=5 --gcs-address=152.81.4.229:64538 --redis-password=5241590000000000 --startup-token=21 --runtime-env-hash=1787716506
(pid=73237)
(pid=73231)
(pid=73227)
(pid=73225)
(pid=73226)
== Status ==
Current time: 2022-05-17 16:07:22 (running for 00:00:00.14)
Memory usage on this node: 10.5/125.0 GiB
Using FIFO scheduling algorithm.
Resources requested: 7.0/16 CPUs, 1.0/1 GPUs, 0.0/72.91 GiB heap, 0.0/35.24 GiB objects (0.0/1.0 accelerator_type:RTX)
Result logdir: /home/ncarrara/work/myrill/core/main/results/apex19
Number of trials: 1/1 (1 RUNNING)
+--------------+----------+--------------------+------------------------+------------------------+------------------------+------------------------+------------------------+------------------------+------------------------+------------------------+------------------------+------------------------+------------------------+---------+-----------------------+
| Trial name | status | loc | env_config/nodes/... | env_config/nodes/... | env_config/nodes/... | env_config/nodes/... | env_config/nodes/... | env_config/nodes/... | env_config/nodes/... | env_config/nodes/... | env_config/nodes/... | env_config/nodes/... | env_config/nodes/... | gamma | model/fcnet_hiddens |
|--------------+----------+--------------------+------------------------+------------------------+------------------------+------------------------+------------------------+------------------------+------------------------+------------------------+------------------------+------------------------+------------------------+---------+-----------------------|
| APEX | RUNNING | 152.81.4.229:73756 | BPSAction | [0, 0.1, 0.25, _fcc0 | 0 | 1000000 | True | False | [] | 1 | [] | True | False | 0.9 | [64, 32] |
+--------------+----------+--------------------+------------------------+------------------------+------------------------+------------------------+------------------------+------------------------+------------------------+------------------------+------------------------+------------------------+------------------------+---------+-----------------------+
(ApexTrainer pid=73756) 2022-05-17 16:07:41,375 INFO trainable.py:159 -- Trainable.setup took 16.101 seconds. If your trainable is slow to initialize, consider setting reuse_actors=True to reduce actor creation overheads.
(RolloutWorker pid=73887) [Level 51][misc] [TrainEnv_1] reset 1
(RolloutWorker pid=73887) [Level 51][misc] [TrainEnv_1] clock: 2022-05-11T19:38:44.491Z -> 2022-05-11T19:45:31.902Z
(RolloutWorker pid=73887) [ERROR][asyncable] [LocalWorld] you should find a way to close correctly your object, use flag, and threading.event systems
(RolloutWorker pid=74270) 2022-05-17 16:07:41,880 WARNING deprecation.py:46 -- DeprecationWarning: `ray.rllib.agents.dqn.dqn.DEFAULT_CONFIG` has been deprecated. Use `ray.rllib.agents.dqn.dqn.DQNConfig(...)` instead. This will raise an error in the future!
(RolloutWorker pid=74270) [Level 51][misc] [EvalEnv_1] env seed 2216427457
(RolloutWorker pid=74270) [Level 51][misc] [EvalEnv_1] reset 0
(RolloutWorker pid=74270) [Level 51][misc] [EvalEnv_1] clock: 2022-05-12T05:01:53.102Z -> 2022-05-12T05:08:27.645Z
(RolloutWorker pid=74270) [Level 51][misc] [EvalEnv_1] step 0
(RolloutWorker pid=74270) [Level 51][misc] [id_callback=139902999837808][on_sub_environment_created] EvalEnv_1
(RolloutWorker pid=73887) [Level 51][misc] [on_episode_start]
(RolloutWorker pid=73887) [Level 51][misc] [TrainEnv_1] step 0
(MultiAgentPrioritizedReplayBuffer pid=74271) 2022-05-17 16:07:44,760 INFO replay_buffer.py:55 -- Estimated max memory usage for replay buffer is 0.0585 GB (500000.0 batches of size 1, 117 bytes each), available system memory is 134.243328 GB
(MultiAgentPrioritizedReplayBuffer pid=74271) 2022-05-17 16:07:44,760 WARNING multi_agent_prioritized_replay_buffer.py:188 -- Adding batches with column `weights` to this buffer while providing weights as a call argument to the add method results in the column being overwritten.
(RolloutWorker pid=73887) [Level 51][misc] [TrainEnv_1] step 200
(MultiAgentPrioritizedReplayBuffer pid=74272) 2022-05-17 16:07:46,434 INFO replay_buffer.py:55 -- Estimated max memory usage for replay buffer is 0.0585 GB (500000.0 batches of size 1, 117 bytes each), available system memory is 134.243328 GB
(MultiAgentPrioritizedReplayBuffer pid=74272) 2022-05-17 16:07:46,434 WARNING multi_agent_prioritized_replay_buffer.py:188 -- Adding batches with column `weights` to this buffer while providing weights as a call argument to the add method results in the column being overwritten.
(MultiAgentPrioritizedReplayBuffer pid=74276) 2022-05-17 16:07:47,987 INFO replay_buffer.py:55 -- Estimated max memory usage for replay buffer is 0.0585 GB (500000.0 batches of size 1, 117 bytes each), available system memory is 134.243328 GB
(MultiAgentPrioritizedReplayBuffer pid=74276) 2022-05-17 16:07:47,987 WARNING multi_agent_prioritized_replay_buffer.py:188 -- Adding batches with column `weights` to this buffer while providing weights as a call argument to the add method results in the column being overwritten.
(RolloutWorker pid=73887) [Level 51][misc] [TrainEnv_1] step 400
(MultiAgentPrioritizedReplayBuffer pid=74277) 2022-05-17 16:07:49,431 INFO replay_buffer.py:55 -- Estimated max memory usage for replay buffer is 0.0585 GB (500000.0 batches of size 1, 117 bytes each), available system memory is 134.243328 GB
(MultiAgentPrioritizedReplayBuffer pid=74277) 2022-05-17 16:07:49,431 WARNING multi_agent_prioritized_replay_buffer.py:188 -- Adding batches with column `weights` to this buffer while providing weights as a call argument to the add method results in the column being overwritten.
(RolloutWorker pid=73887) [Level 51][misc] [TrainEnv_1] step 600
(RolloutWorker pid=73887) [Level 51][misc] [TrainEnv_1] step 800
(RolloutWorker pid=73887) [Level 51][misc] [TrainEnv_1] reset 2
(RolloutWorker pid=73887) [Level 51][misc] [TrainEnv_1] clock: 2022-05-14T11:43:36.346Z -> 2022-05-14T12:00:25.576Z
(RolloutWorker pid=73887) [ERROR][asyncable] [LocalWorld] you should find a way to close correctly your object, use flag, and threading.event systems
(RolloutWorker pid=73887) [Level 51][misc] [on_episode_start]
(RolloutWorker pid=73887) [Level 51][misc] [TrainEnv_1] step 0
(RolloutWorker pid=73887) [Level 51][misc] [TrainEnv_1] step 200
(RolloutWorker pid=73887) [Level 51][misc] [TrainEnv_1] step 400
(RolloutWorker pid=73887) [Level 51][misc] [TrainEnv_1] step 600
(RolloutWorker pid=73887) [Level 51][misc] [TrainEnv_1] step 800
(RolloutWorker pid=73887) [Level 51][misc] [TrainEnv_1] reset 3
(RolloutWorker pid=73887) [Level 51][misc] [TrainEnv_1] clock: 2022-05-14T14:17:23.018Z -> 2022-05-14T14:31:36.041Z
(RolloutWorker pid=73887) [ERROR][asyncable] [LocalWorld] you should find a way to close correctly your object, use flag, and threading.event systems
(RolloutWorker pid=73887) [Level 51][misc] [on_episode_start]
(RolloutWorker pid=73887) [Level 51][misc] [TrainEnv_1] step 0
(RolloutWorker pid=74270) [Level 51][misc] [EvalEnv_1] reset 1
(RolloutWorker pid=74270) [Level 51][misc] [EvalEnv_1] clock: 2022-05-15T12:26:44.100Z -> 2022-05-15T12:40:47.363Z
(RolloutWorker pid=74270) [ERROR][asyncable] [LocalWorld] you should find a way to close correctly your object, use flag, and threading.event systems
(RolloutWorker pid=74270) [Level 51][misc] [on_episode_start]
(RolloutWorker pid=74270) [Level 51][misc] [EvalEnv_1] step 0
(RolloutWorker pid=74270) [Level 51][misc] [EvalEnv_1] step 200
(RolloutWorker pid=74270) [Level 51][misc] [EvalEnv_1] step 400
(RolloutWorker pid=74270) [Level 51][misc] [EvalEnv_1] step 600
(RolloutWorker pid=74270) [Level 51][misc] [EvalEnv_1] step 800
(RolloutWorker pid=74270) [Level 51][misc] [EvalEnv_1] reset 2
(RolloutWorker pid=74270) [Level 51][misc] [EvalEnv_1] clock: 2022-05-11T18:42:03.464Z -> 2022-05-11T18:49:25.894Z
(RolloutWorker pid=74270) [ERROR][asyncable] [LocalWorld] you should find a way to close correctly your object, use flag, and threading.event systems
(RolloutWorker pid=74270) [Level 51][misc] [on_episode_start]
(RolloutWorker pid=73887) [Level 51][misc] [TrainEnv_1] step 200
(RolloutWorker pid=74270) [Level 51][misc] [EvalEnv_1] step 0
(RolloutWorker pid=74270) [Level 51][misc] [EvalEnv_1] step 200
(RolloutWorker pid=74270) [Level 51][misc] [EvalEnv_1] step 400
(RolloutWorker pid=74270) [Level 51][misc] [EvalEnv_1] step 600
(RolloutWorker pid=74270) [Level 51][misc] [EvalEnv_1] step 800
(RolloutWorker pid=74270) [Level 51][misc] [EvalEnv_1] reset 3
(RolloutWorker pid=74270) [Level 51][misc] [EvalEnv_1] clock: 2022-05-13T03:58:24.705Z -> 2022-05-13T04:14:16.128Z
(RolloutWorker pid=74270) [ERROR][asyncable] [LocalWorld] you should find a way to close correctly your object, use flag, and threading.event systems
(RolloutWorker pid=74270) [Level 51][misc] [on_episode_start]
(RolloutWorker pid=74270) [Level 51][misc] [EvalEnv_1] step 0
== Status ==
Current time: 2022-05-17 16:08:26 (running for 00:01:03.71)
Memory usage on this node: 14.7/125.0 GiB
Using FIFO scheduling algorithm.
Resources requested: 7.0/16 CPUs, 1.0/1 GPUs, 0.0/72.91 GiB heap, 0.0/35.24 GiB objects (0.0/1.0 accelerator_type:RTX)
Result logdir: /home/ncarrara/work/myrill/core/main/results/apex19
Number of trials: 1/1 (1 RUNNING)
+--------------+----------+--------------------+------------------------+------------------------+------------------------+------------------------+------------------------+------------------------+------------------------+------------------------+------------------------+------------------------+------------------------+---------+-----------------------+--------+------------------+------+----------+------------------------+
| Trial name | status | loc | env_config/nodes/... | env_config/nodes/... | env_config/nodes/... | env_config/nodes/... | env_config/nodes/... | env_config/nodes/... | env_config/nodes/... | env_config/nodes/... | env_config/nodes/... | env_config/nodes/... | env_config/nodes/... | gamma | model/fcnet_hiddens | iter | total time (s) | ts | reward | custom_metrics/up... |
|--------------+----------+--------------------+------------------------+------------------------+------------------------+------------------------+------------------------+------------------------+------------------------+------------------------+------------------------+------------------------+------------------------+---------+-----------------------+--------+------------------+------+----------+------------------------|
| APEX | RUNNING | 152.81.4.229:73756 | BPSAction | [0, 0.1, 0.25, _fcc0 | 0 | 1000000 | True | False | [] | 1 | [] | True | False | 0.9 | [64, 32] | 2 | 24.4319 | 2000 | -6.67196 | -0.0052229 |
+--------------+----------+--------------------+------------------------+------------------------+------------------------+------------------------+------------------------+------------------------+------------------------+------------------------+------------------------+------------------------+------------------------+---------+-----------------------+--------+------------------+------+----------+------------------------+
(RolloutWorker pid=74270) [Level 51][misc] [EvalEnv_1] step 200
(RolloutWorker pid=74270) [Level 51][misc] [EvalEnv_1] step 400
(RolloutWorker pid=74270) [Level 51][misc] [EvalEnv_1] step 600
(RolloutWorker pid=74270) [Level 51][misc] [EvalEnv_1] step 800
(RolloutWorker pid=74270) [Level 51][misc] [EvalEnv_1] reset 4
(RolloutWorker pid=74270) [Level 51][misc] [EvalEnv_1] clock: 2022-05-13T02:38:16.716Z -> 2022-05-13T02:48:00.773Z
(RolloutWorker pid=74270) [ERROR][asyncable] [LocalWorld] you should find a way to close correctly your object, use flag, and threading.event systems
(RolloutWorker pid=74270) [Level 51][misc] [on_episode_start]
(RolloutWorker pid=74270) [Level 51][misc] [EvalEnv_1] step 0
(RolloutWorker pid=74270) [Level 51][misc] [EvalEnv_1] step 200
(RolloutWorker pid=74270) [Level 51][misc] [EvalEnv_1] step 400
(RolloutWorker pid=74270) [Level 51][misc] [EvalEnv_1] step 600
(RolloutWorker pid=74270) [Level 51][misc] [EvalEnv_1] step 800
(RolloutWorker pid=74270) [Level 51][misc] [EvalEnv_1] reset 5
(RolloutWorker pid=74270) [Level 51][misc] [EvalEnv_1] clock: 2022-05-11T20:57:16.138Z -> 2022-05-11T21:05:41.755Z
(RolloutWorker pid=74270) [ERROR][asyncable] [LocalWorld] you should find a way to close correctly your object, use flag, and threading.event systems
(RolloutWorker pid=74270) [Level 51][misc] [on_episode_start]
(RolloutWorker pid=74270) [Level 51][misc] [EvalEnv_1] step 0
(RolloutWorker pid=74270) [Level 51][misc] [EvalEnv_1] step 200
(RolloutWorker pid=74270) [Level 51][misc] [EvalEnv_1] step 400
(RolloutWorker pid=74270) [Level 51][misc] [EvalEnv_1] step 600
(RolloutWorker pid=74270) [Level 51][misc] [EvalEnv_1] step 800
(RolloutWorker pid=74270) [Level 51][misc] [EvalEnv_1] reset 6
(RolloutWorker pid=74270) [Level 51][misc] [EvalEnv_1] clock: 2022-05-14T14:31:36.736Z -> 2022-05-14T14:46:42.419Z
(RolloutWorker pid=74270) [ERROR][asyncable] [LocalWorld] you should find a way to close correctly your object, use flag, and threading.event systems
(RolloutWorker pid=74270) [Level 51][misc] [on_episode_start]
(RolloutWorker pid=73887) [Level 51][misc] [TrainEnv_1] step 400
(RolloutWorker pid=74270) [Level 51][misc] [EvalEnv_1] step 0
(RolloutWorker pid=74270) [Level 51][misc] [EvalEnv_1] step 200
(RolloutWorker pid=74270) [Level 51][misc] [EvalEnv_1] step 400
(RolloutWorker pid=74270) [Level 51][misc] [EvalEnv_1] step 600
(RolloutWorker pid=74270) [Level 51][misc] [EvalEnv_1] step 800
(RolloutWorker pid=74270) [Level 51][misc] [EvalEnv_1] reset 7
(RolloutWorker pid=74270) [Level 51][misc] [EvalEnv_1] clock: 2022-05-15T10:55:08.893Z -> 2022-05-15T11:09:30.729Z
(RolloutWorker pid=74270) [ERROR][asyncable] [LocalWorld] you should find a way to close correctly your object, use flag, and threading.event systems
(RolloutWorker pid=74270) [Level 51][misc] [on_episode_start]
(RolloutWorker pid=74270) [Level 51][misc] [EvalEnv_1] step 0
(RolloutWorker pid=74270) [Level 51][misc] [EvalEnv_1] step 200
(RolloutWorker pid=74270) [Level 51][misc] [EvalEnv_1] step 400
(RolloutWorker pid=74270) [Level 51][misc] [EvalEnv_1] step 600
(RolloutWorker pid=74270) [Level 51][misc] [EvalEnv_1] step 800
(RolloutWorker pid=74270) [Level 51][misc] [EvalEnv_1] reset 8
(RolloutWorker pid=74270) [Level 51][misc] [EvalEnv_1] clock: 2022-05-12T22:32:24.169Z -> 2022-05-12T22:43:54.519Z
(RolloutWorker pid=74270) [ERROR][asyncable] [LocalWorld] you should find a way to close correctly your object, use flag, and threading.event systems
(RolloutWorker pid=74270) [Level 51][misc] [on_episode_start]
(RolloutWorker pid=74270) [Level 51][misc] [EvalEnv_1] step 0
(RolloutWorker pid=74270) [Level 51][misc] [EvalEnv_1] step 200
(RolloutWorker pid=74270) [Level 51][misc] [EvalEnv_1] step 400
(RolloutWorker pid=74270) [Level 51][misc] [EvalEnv_1] step 600
(RolloutWorker pid=74270) [Level 51][misc] [EvalEnv_1] step 800
(RolloutWorker pid=74270) [Level 51][misc] [EvalEnv_1] reset 9
(RolloutWorker pid=74270) [Level 51][misc] [EvalEnv_1] clock: 2022-05-14T06:49:17.779Z -> 2022-05-14T07:13:54.980Z
(RolloutWorker pid=74270) [ERROR][asyncable] [LocalWorld] you should find a way to close correctly your object, use flag, and threading.event systems
(RolloutWorker pid=74270) [Level 51][misc] [on_episode_start]
(RolloutWorker pid=74270) [Level 51][misc] [EvalEnv_1] step 0
Process finished with exit code 137 (interrupted by signal 9: SIGKILL)
Edit: Btw if I change evaluation_interval
to 1
and local_mode=True
. The training env will be reset a few times but won’t even step:
[Level 51][misc] [seed_sever_seed_generator] main_seed is negative (-1), we are setting a random seed 1691414407
2022-05-17 17:48:18,870 WARNING deprecation.py:46 -- DeprecationWarning: `ray.rllib.execution.buffers` has been deprecated. Use `ray.rllib.utils.replay_buffers` instead. This will raise an error in the future!
2022-05-17 17:48:18,887 WARNING deprecation.py:46 -- DeprecationWarning: `ray.rllib.agents.dqn.dqn.DEFAULT_CONFIG` has been deprecated. Use `ray.rllib.agents.dqn.dqn.DQNConfig(...)` instead. This will raise an error in the future!
2022-05-17 17:48:19,039 WARNING worker.py:525 -- `ray.get_gpu_ids()` will always return the empty list when called from the driver. This is because Ray does not manage GPU allocations to the driver process.
:task_name:bundle_reservation_check_func
:task_name:bundle_reservation_check_func
:actor_name:ApexTrainer
:actor_name:ApexTrainer
2022-05-17 17:48:19,173 INFO trainer.py:1727 -- Your framework setting is 'tf', meaning you are using static-graph mode. Set framework='tf2' to enable eager execution with tf2.x. You may also then want to set eager_tracing=True in order to reach similar execution speed as with static-graph mode.
2022-05-17 17:48:19,174 WARNING deprecation.py:46 -- DeprecationWarning: `config['learning_starts']` has been deprecated. config['replay_buffer_config']['learning_starts'] This will raise an error in the future!
2022-05-17 17:48:19,174 WARNING deprecation.py:46 -- DeprecationWarning: `config['multiagent']['replay_mode']` has been deprecated. config['replay_buffer_config']['replay_mode'] This will raise an error in the future!
2022-05-17 17:48:19,174 INFO simple_q.py:291 -- In multi-agent mode, policies will be optimized sequentially by the multi-GPU optimizer. Consider setting `simple_optimizer=True` if this doesn't work for you.
2022-05-17 17:48:19,174 INFO trainer.py:325 -- Current log_level is ERROR. For more information, set 'log_level': 'INFO' / 'DEBUG' or use the -v and -vv flags.
:actor_name:RolloutWorker
:actor_name:RolloutWorker
[Level 51][misc] Worker 1 is creating the env TrainEnv_1
[Level 51][misc] [TrainEnv_1] env seed 3480895465
[Level 51][misc] [id_callback=140628712881072][on_sub_environment_created] TrainEnv_1
:actor_name:RolloutWorker
:actor_name:RolloutWorker
[Level 51][misc] Worker 1 is creating the env EvalEnv_1
[Level 51][misc] [EvalEnv_1] env seed 1264697858
[Level 51][misc] [id_callback=140590319670320][on_sub_environment_created] EvalEnv_1
[Level 51][misc] [on_trainer_init]
:actor_name:MultiAgentPrioritizedReplayBuffer
:actor_name:MultiAgentPrioritizedReplayBuffer
:actor_name:MultiAgentPrioritizedReplayBuffer
:actor_name:MultiAgentPrioritizedReplayBuffer
:actor_name:MultiAgentPrioritizedReplayBuffer
:actor_name:MultiAgentPrioritizedReplayBuffer
:actor_name:MultiAgentPrioritizedReplayBuffer
:actor_name:MultiAgentPrioritizedReplayBuffer
[Level 51][misc] [TrainEnv_1] reset 0
[Level 51][misc] [TrainEnv_1] clock: 2022-05-12T13:51:58.803Z -> 2022-05-12T14:00:28.448Z
[Level 51][misc] [on_episode_start]
[Level 51][misc] [TrainEnv_1] step 0
[Level 51][misc] [EvalEnv_1] reset 0
[Level 51][misc] [EvalEnv_1] clock: 2022-05-12T12:40:29.438Z -> 2022-05-12T12:48:45.192Z
[Level 51][misc] [on_episode_start]
[Level 51][misc] [EvalEnv_1] step 0
[Level 51][misc] [EvalEnv_1] step 200
[Level 51][misc] [EvalEnv_1] step 400
[Level 51][misc] [EvalEnv_1] step 600
[Level 51][misc] [EvalEnv_1] step 800
[Level 51][misc] [EvalEnv_1] reset 1
[Level 51][misc] [EvalEnv_1] clock: 2022-05-14T11:00:32.625Z -> 2022-05-14T11:14:48.559Z
[ERROR][asyncable] [LocalWorld] you should find a way to close correctly your object, use flag, and threading.event systems
[Level 51][misc] [on_episode_start]
[Level 51][misc] [EvalEnv_1] step 0
[Level 51][misc] [EvalEnv_1] step 200
[Level 51][misc] [EvalEnv_1] step 400
[Level 51][misc] [EvalEnv_1] step 600
[Level 51][misc] [EvalEnv_1] step 800
[Level 51][misc] [EvalEnv_1] reset 2
[Level 51][misc] [EvalEnv_1] clock: 2022-05-13T11:33:54.583Z -> 2022-05-13T11:49:05.886Z
[ERROR][asyncable] [LocalWorld] you should find a way to close correctly your object, use flag, and threading.event systems
[Level 51][misc] [on_episode_start]
[Level 51][misc] [EvalEnv_1] step 0
[Level 51][misc] [EvalEnv_1] step 200
2022-05-17 17:48:59,462 WARNING tune.py:687 -- Stop signal received (e.g. via SIGINT/Ctrl+C), ending Ray Tune run. This will try to checkpoint the experiment state one last time. Press CTRL+C (or send SIGINT/SIGKILL/SIGTERM) to skip.
2022-05-17 17:48:59,479 WARNING tune.py:687 -- Stop signal received (e.g. via SIGINT/Ctrl+C), ending Ray Tune run. This will try to checkpoint the experiment state one last time. Press CTRL+C (or send SIGINT/SIGKILL/SIGTERM) to skip.
Process finished with exit code 137 (interrupted by signal 9: SIGKILL)