Actor died unexpectedly (GrpcUnavailable: failed to connect to all addresses)

How severe does this issue affect your experience of using Ray?

  • High: It blocks me to complete my task.

Hi!

I began working with ray on macOS, with no problem to run the demo scripts. But when switching on a windows machine, everything I try to run stops at iter=2.

Here are the output for: rllib train --run=PPO --env=CartPole-v0

2022-06-27 16:52:02,830 ERROR syncer.py:147 -- Log sync requires rsync to be installed.
(PPOTrainer pid=28512) 2022-06-27 16:52:15,575  INFO trainer.py:2332 -- Your framework setting is 'tf', meaning you are using static-graph mode. Set framework='tf2' to enable eager e
xecution with tf2.x. You may also then want to set eager_tracing=True in order to reach similar execution speed as with static-graph mode.
(PPOTrainer pid=28512) 2022-06-27 16:52:15,576  INFO ppo.py:414 -- In multi-agent mode, policies will be optimized sequentially by the multi-GPU optimizer. Consider setting simple_op
timizer=True if this doesn't work for you.
(PPOTrainer pid=28512) 2022-06-27 16:52:15,578  INFO trainer.py:903 -- Current log_level is WARN. For more information, set 'log_level': 'INFO' / 'DEBUG' or use the -v and -vv flags.
== Status ==
Current time: 2022-06-27 16:52:31 (running for 00:00:28.69)
Memory usage on this node: 17.4/31.9 GiB
Using FIFO scheduling algorithm.
Resources requested: 3.0/8 CPUs, 0/1 GPUs, 0.0/10.62 GiB heap, 0.0/5.31 GiB objects
Result logdir: C:\Users\<user>\ray_results\default
Number of trials: 1/1 (1 RUNNING)
+-----------------------------+----------+-----------------+
| Trial name                  | status   | loc             |
|-----------------------------+----------+-----------------|
| PPO_CartPole-v0_b2fc9_00000 | RUNNING  | 127.0.0.1:28512 |
+-----------------------------+----------+-----------------+


(PPOTrainer pid=28512) 2022-06-27 16:52:31,305  INFO trainable.py:159 -- Trainable.setup took 15.731 seconds. If your trainable is slow to initialize, consider setting reuse_actors=T
rue to reduce actor creation overheads.
== Status ==
Current time: 2022-06-27 16:52:36 (running for 00:00:33.75)
Memory usage on this node: 17.4/31.9 GiB
Using FIFO scheduling algorithm.
Resources requested: 3.0/8 CPUs, 0/1 GPUs, 0.0/10.62 GiB heap, 0.0/5.31 GiB objects
Result logdir: C:\Users\<user>\ray_results\default
Number of trials: 1/1 (1 RUNNING)
+-----------------------------+----------+-----------------+
| Trial name                  | status   | loc             |
|-----------------------------+----------+-----------------|
| PPO_CartPole-v0_b2fc9_00000 | RUNNING  | 127.0.0.1:28512 |
+-----------------------------+----------+-----------------+


(PPOTrainer pid=28512) 2022-06-27 16:52:40,738  WARNING deprecation.py:46 -- DeprecationWarning: `slice` has been deprecated. Use `SampleBatch[start:stop]` instead. This will raise a
n error in the future!
== Status ==
Current time: 2022-06-27 16:52:41 (running for 00:00:38.79)
Memory usage on this node: 17.4/31.9 GiB
Using FIFO scheduling algorithm.
Resources requested: 3.0/8 CPUs, 0/1 GPUs, 0.0/10.62 GiB heap, 0.0/5.31 GiB objects
Result logdir: C:\Users\<user>\ray_results\default
Number of trials: 1/1 (1 RUNNING)
+-----------------------------+----------+-----------------+
| Trial name                  | status   | loc             |
|-----------------------------+----------+-----------------|
| PPO_CartPole-v0_b2fc9_00000 | RUNNING  | 127.0.0.1:28512 |
+-----------------------------+----------+-----------------+


== Status ==
Current time: 2022-06-27 16:52:46 (running for 00:00:43.85)
Memory usage on this node: 17.4/31.9 GiB
Using FIFO scheduling algorithm.
Resources requested: 3.0/8 CPUs, 0/1 GPUs, 0.0/10.62 GiB heap, 0.0/5.31 GiB objects
Result logdir: C:\Users\<user>\ray_results\default
Number of trials: 1/1 (1 RUNNING)
+-----------------------------+----------+-----------------+
| Trial name                  | status   | loc             |
|-----------------------------+----------+-----------------|
| PPO_CartPole-v0_b2fc9_00000 | RUNNING  | 127.0.0.1:28512 |
+-----------------------------+----------+-----------------+


== Status ==
Current time: 2022-06-27 16:52:51 (running for 00:00:48.90)
Memory usage on this node: 17.4/31.9 GiB
Using FIFO scheduling algorithm.
Resources requested: 3.0/8 CPUs, 0/1 GPUs, 0.0/10.62 GiB heap, 0.0/5.31 GiB objects
Result logdir: C:\Users\<user>\ray_results\default
Number of trials: 1/1 (1 RUNNING)
+-----------------------------+----------+-----------------+
| Trial name                  | status   | loc             |
|-----------------------------+----------+-----------------|
| PPO_CartPole-v0_b2fc9_00000 | RUNNING  | 127.0.0.1:28512 |
+-----------------------------+----------+-----------------+


== Status ==
Current time: 2022-06-27 16:52:57 (running for 00:00:54.50)
Memory usage on this node: 17.4/31.9 GiB
Using FIFO scheduling algorithm.
Resources requested: 3.0/8 CPUs, 0/1 GPUs, 0.0/10.62 GiB heap, 0.0/5.31 GiB objects
Result logdir: C:\Users\<user>\ray_results\default
Number of trials: 1/1 (1 RUNNING)
+-----------------------------+----------+-----------------+--------+------------------+------+----------+----------------------+----------------------+--------------------+
| Trial name                  | status   | loc             |   iter |   total time (s) |   ts |   reward |   episode_reward_max |   episode_reward_min |   episode_len_mean |
|-----------------------------+----------+-----------------+--------+------------------+------+----------+----------------------+----------------------+--------------------|
| PPO_CartPole-v0_b2fc9_00000 | RUNNING  | 127.0.0.1:28512 |      1 |          20.6817 | 4000 |  22.0276 |                  107 |                    9 |            22.0276 |
+-----------------------------+----------+-----------------+--------+------------------+------+----------+----------------------+----------------------+--------------------+


== Status ==
Current time: 2022-06-27 16:53:02 (running for 00:00:59.54)
Memory usage on this node: 17.4/31.9 GiB
Using FIFO scheduling algorithm.
Resources requested: 3.0/8 CPUs, 0/1 GPUs, 0.0/10.62 GiB heap, 0.0/5.31 GiB objects
Result logdir: C:\Users\<user>\ray_results\default
Number of trials: 1/1 (1 RUNNING)
+-----------------------------+----------+-----------------+--------+------------------+------+----------+----------------------+----------------------+--------------------+
| Trial name                  | status   | loc             |   iter |   total time (s) |   ts |   reward |   episode_reward_max |   episode_reward_min |   episode_len_mean |
|-----------------------------+----------+-----------------+--------+------------------+------+----------+----------------------+----------------------+--------------------|
| PPO_CartPole-v0_b2fc9_00000 | RUNNING  | 127.0.0.1:28512 |      1 |          20.6817 | 4000 |  22.0276 |                  107 |                    9 |            22.0276 |
+-----------------------------+----------+-----------------+--------+------------------+------+----------+----------------------+----------------------+--------------------+


== Status ==
Current time: 2022-06-27 16:53:07 (running for 00:01:04.62)
Memory usage on this node: 17.4/31.9 GiB
Using FIFO scheduling algorithm.
Resources requested: 3.0/8 CPUs, 0/1 GPUs, 0.0/10.62 GiB heap, 0.0/5.31 GiB objects
Result logdir: C:\Users\<user>\ray_results\default
Number of trials: 1/1 (1 RUNNING)
+-----------------------------+----------+-----------------+--------+------------------+------+----------+----------------------+----------------------+--------------------+
| Trial name                  | status   | loc             |   iter |   total time (s) |   ts |   reward |   episode_reward_max |   episode_reward_min |   episode_len_mean |
|-----------------------------+----------+-----------------+--------+------------------+------+----------+----------------------+----------------------+--------------------|
| PPO_CartPole-v0_b2fc9_00000 | RUNNING  | 127.0.0.1:28512 |      1 |          20.6817 | 4000 |  22.0276 |                  107 |                    9 |            22.0276 |
+-----------------------------+----------+-----------------+--------+------------------+------+----------+----------------------+----------------------+--------------------+


== Status ==
Current time: 2022-06-27 16:53:12 (running for 00:01:09.67)
Memory usage on this node: 17.4/31.9 GiB
Using FIFO scheduling algorithm.
Resources requested: 3.0/8 CPUs, 0/1 GPUs, 0.0/10.62 GiB heap, 0.0/5.31 GiB objects
Result logdir: C:\Users\<user>\ray_results\default
Number of trials: 1/1 (1 RUNNING)
+-----------------------------+----------+-----------------+--------+------------------+------+----------+----------------------+----------------------+--------------------+
| Trial name                  | status   | loc             |   iter |   total time (s) |   ts |   reward |   episode_reward_max |   episode_reward_min |   episode_len_mean |
|-----------------------------+----------+-----------------+--------+------------------+------+----------+----------------------+----------------------+--------------------|
| PPO_CartPole-v0_b2fc9_00000 | RUNNING  | 127.0.0.1:28512 |      1 |          20.6817 | 4000 |  22.0276 |                  107 |                    9 |            22.0276 |
+-----------------------------+----------+-----------------+--------+------------------+------+----------+----------------------+----------------------+--------------------+


== Status ==
Current time: 2022-06-27 16:53:18 (running for 00:01:15.68)
Memory usage on this node: 17.4/31.9 GiB
Using FIFO scheduling algorithm.
Resources requested: 3.0/8 CPUs, 0/1 GPUs, 0.0/10.62 GiB heap, 0.0/5.31 GiB objects
Result logdir: C:\Users\<user>\ray_results\default
Number of trials: 1/1 (1 RUNNING)
+-----------------------------+----------+-----------------+--------+------------------+------+----------+----------------------+----------------------+--------------------+
| Trial name                  | status   | loc             |   iter |   total time (s) |   ts |   reward |   episode_reward_max |   episode_reward_min |   episode_len_mean |
|-----------------------------+----------+-----------------+--------+------------------+------+----------+----------------------+----------------------+--------------------|
| PPO_CartPole-v0_b2fc9_00000 | RUNNING  | 127.0.0.1:28512 |      2 |          41.8296 | 8000 |    40.44 |                  162 |                    9 |              40.44 |
+-----------------------------+----------+-----------------+--------+------------------+------+----------+----------------------+----------------------+--------------------+


== Status ==
Current time: 2022-06-27 16:53:23 (running for 00:01:20.73)
Memory usage on this node: 17.4/31.9 GiB
Using FIFO scheduling algorithm.
Resources requested: 3.0/8 CPUs, 0/1 GPUs, 0.0/10.62 GiB heap, 0.0/5.31 GiB objects
Result logdir: C:\Users\<user>\ray_results\default
Number of trials: 1/1 (1 RUNNING)
+-----------------------------+----------+-----------------+--------+------------------+------+----------+----------------------+----------------------+--------------------+
| Trial name                  | status   | loc             |   iter |   total time (s) |   ts |   reward |   episode_reward_max |   episode_reward_min |   episode_len_mean |
|-----------------------------+----------+-----------------+--------+------------------+------+----------+----------------------+----------------------+--------------------|
| PPO_CartPole-v0_b2fc9_00000 | RUNNING  | 127.0.0.1:28512 |      2 |          41.8296 | 8000 |    40.44 |                  162 |                    9 |              40.44 |
+-----------------------------+----------+-----------------+--------+------------------+------+----------+----------------------+----------------------+--------------------+


== Status ==
Current time: 2022-06-27 16:53:28 (running for 00:01:25.80)
Memory usage on this node: 17.4/31.9 GiB
Using FIFO scheduling algorithm.
Resources requested: 3.0/8 CPUs, 0/1 GPUs, 0.0/10.62 GiB heap, 0.0/5.31 GiB objects
Result logdir: C:\Users\<user>\ray_results\default
Number of trials: 1/1 (1 RUNNING)
+-----------------------------+----------+-----------------+--------+------------------+------+----------+----------------------+----------------------+--------------------+
| Trial name                  | status   | loc             |   iter |   total time (s) |   ts |   reward |   episode_reward_max |   episode_reward_min |   episode_len_mean |
|-----------------------------+----------+-----------------+--------+------------------+------+----------+----------------------+----------------------+--------------------|
| PPO_CartPole-v0_b2fc9_00000 | RUNNING  | 127.0.0.1:28512 |      2 |          41.8296 | 8000 |    40.44 |                  162 |                    9 |              40.44 |
+-----------------------------+----------+-----------------+--------+------------------+------+----------+----------------------+----------------------+--------------------+


== Status ==
Current time: 2022-06-27 16:53:33 (running for 00:01:30.85)
Memory usage on this node: 17.4/31.9 GiB
Using FIFO scheduling algorithm.
Resources requested: 3.0/8 CPUs, 0/1 GPUs, 0.0/10.62 GiB heap, 0.0/5.31 GiB objects
Result logdir: C:\Users\<user>\ray_results\default
Number of trials: 1/1 (1 RUNNING)
+-----------------------------+----------+-----------------+--------+------------------+------+----------+----------------------+----------------------+--------------------+
| Trial name                  | status   | loc             |   iter |   total time (s) |   ts |   reward |   episode_reward_max |   episode_reward_min |   episode_len_mean |
|-----------------------------+----------+-----------------+--------+------------------+------+----------+----------------------+----------------------+--------------------|
| PPO_CartPole-v0_b2fc9_00000 | RUNNING  | 127.0.0.1:28512 |      2 |          41.8296 | 8000 |    40.44 |                  162 |                    9 |              40.44 |
+-----------------------------+----------+-----------------+--------+------------------+------+----------+----------------------+----------------------+--------------------+


(pid=) [2022-06-27 16:53:36,992 E 20452 35824] (raylet.exe) agent_manager.cc:107: The raylet exited immediately because the Ray agent failed. The raylet fate shares with the agent. T
his can happen because the Ray agent was unexpectedly killed or failed. See `dashboard_agent.log` for the root cause.
(PPOTrainer pid=28512) Stack (most recent call first):
(PPOTrainer pid=28512)   File "c:\users\<user>\pycharmprojects\my_venv\lib\site-packages\ray\worker.py", line 364 in get_objects
(PPOTrainer pid=28512)   File "c:\users\<user>\pycharmprojects\my_venv\lib\site-packages\ray\worker.py", line 1825 in get
(PPOTrainer pid=28512)   File "c:\users\<user>\pycharmprojects\my_venv\lib\site-packages\ray\_private\client_mode_hook.py", line 105 in wrapper
(PPOTrainer pid=28512)   File "c:\users\<user>\pycharmprojects\my_venv\lib\site-packages\ray\rllib\execution\rollout_ops.py", line 99 in synchronous_parallel_sample
(PPOTrainer pid=28512)   File "c:\users\<user>\pycharmprojects\my_venv\lib\site-packages\ray\rllib\agents\ppo\ppo.py", line 437 in training_iteration
(PPOTrainer pid=28512)   File "c:\users\<user>\pycharmprojects\my_venv\lib\site-packages\ray\util\tracing\tracing_helper.py", line 462 in _resume_span
(PPOTrainer pid=28512)   File "c:\users\<user>\pycharmprojects\my_venv\lib\site-packages\ray\rllib\agents\trainer.py", line 2209 in _exec_plan_or_training_iteration_fn      
(PPOTrainer pid=28512)   File "c:\users\<user>\pycharmprojects\my_venv\lib\site-packages\ray\util\tracing\tracing_helper.py", line 462 in _resume_span
(PPOTrainer pid=28512)   File "c:\users\<user>\pycharmprojects\my_venv\lib\site-packages\ray\rllib\agents\trainer.py", line 1214 in step_attempt
(PPOTrainer pid=28512)   File "c:\users\<user>\pycharmprojects\my_venv\lib\site-packages\ray\util\tracing\tracing_helper.py", line 462 in _resume_span
(PPOTrainer pid=28512)   File "c:\users\<user>\pycharmprojects\my_venv\lib\site-packages\ray\rllib\agents\trainer.py", line 1112 in step
(PPOTrainer pid=28512)   File "c:\users\<user>\pycharmprojects\my_venv\lib\site-packages\ray\util\tracing\tracing_helper.py", line 462 in _resume_span
(PPOTrainer pid=28512)   File "c:\users\<user>\pycharmprojects\my_venv\lib\site-packages\ray\tune\trainable.py", line 360 in train
(PPOTrainer pid=28512)   File "c:\users\<user>\pycharmprojects\my_venv\lib\site-packages\ray\util\tracing\tracing_helper.py", line 462 in _resume_span
(PPOTrainer pid=28512)   File "c:\users\<user>\pycharmprojects\my_venv\lib\site-packages\ray\_private\function_manager.py", line 675 in actor_method_executor
(PPOTrainer pid=28512)   File "c:\users\<user>\pycharmprojects\my_venv\lib\site-packages\ray\worker.py", line 451 in main_loop
(PPOTrainer pid=28512)   File "c:\users\<user>\pycharmprojects\my_venv\lib\site-packages\ray\workers\default_worker.py", line 238 in <module>
(pid=) 2022-06-27 16:53:37,971  INFO context.py:67 -- Exec'ing worker with command: "c:\users\<user>\pycharmprojects\my_venv\scripts\python.exe" c:\users\<user>\pycharmpro
jects\my_venv\lib\site-packages\ray\workers/default_worker.py --node-ip-address=127.0.0.1 --node-manager-port=56227 --object-store-name=tcp://127.0.0.1:57638 --raylet-name=tcp
://127.0.0.1:57535 --redis-address=None --storage=None --temp-dir=C:\Users\<user>\AppData\Local\Temp\ray --metrics-agent-port=65256 --logging-rotate-bytes=536870912 --logging-rotat
e-backup-count=5 --gcs-address=127.0.0.1:56492 --redis-password=5241590000000000 --startup-token=8 --runtime-env-hash=-1081166177
(pid=) 2022-06-27 16:53:38,245  INFO context.py:67 -- Exec'ing worker with command: "c:\users\<user>\pycharmprojects\my_venv\scripts\python.exe" c:\users\<user>\pycharmpro
jects\my_venv\lib\site-packages\ray\workers/default_worker.py --node-ip-address=127.0.0.1 --node-manager-port=56227 --object-store-name=tcp://127.0.0.1:57638 --raylet-name=tcp
://127.0.0.1:57535 --redis-address=None --storage=None --temp-dir=C:\Users\<user>\AppData\Local\Temp\ray --metrics-agent-port=65256 --logging-rotate-bytes=536870912 --logging-rotat
e-backup-count=5 --gcs-address=127.0.0.1:56492 --redis-password=5241590000000000 --startup-token=9 --runtime-env-hash=-1081166177
(pid=) 2022-06-27 16:53:38,262  INFO context.py:67 -- Exec'ing worker with command: "c:\users\<user>\pycharmprojects\my_venv\scripts\python.exe" c:\users\<user>\pycharmpro
jects\my_venv\lib\site-packages\ray\workers/default_worker.py --node-ip-address=127.0.0.1 --node-manager-port=56227 --object-store-name=tcp://127.0.0.1:57638 --raylet-name=tcp
://127.0.0.1:57535 --redis-address=None --storage=None --temp-dir=C:\Users\<user>\AppData\Local\Temp\ray --metrics-agent-port=65256 --logging-rotate-bytes=536870912 --logging-rotat
e-backup-count=5 --gcs-address=127.0.0.1:56492 --redis-password=5241590000000000 --startup-token=10 --runtime-env-hash=-1081166177
(pid=) [2022-06-27 16:53:39,075 E 35032 20356] (gcs_server.exe) gcs_server.cc:283: Failed to get the resource load: GrpcUnavailable: RPC Error message: failed to connect to all addre
sses; RPC Error details:
(pid=) [2022-06-27 16:53:39,075 E 35032 20356] (gcs_server.exe) gcs_server.cc:283: Failed to get the resource load: GrpcUnavailable: RPC Error message: failed to connect to all addre
sses; RPC Error details:
(pid=) [2022-06-27 16:53:39,075 E 35032 20356] (gcs_server.exe) gcs_server.cc:283: Failed to get the resource load: GrpcUnavailable: RPC Error message: failed to connect to all addre
sses; RPC Error details:
2022-06-27 16:53:39,364 ERROR trial_runner.py:886 -- Trial PPO_CartPole-v0_b2fc9_00000: Error processing event.
NoneType: None
== Status ==
Current time: 2022-06-27 16:53:39 (running for 00:01:36.79)
Memory usage on this node: 14.2/31.9 GiB
Using FIFO scheduling algorithm.
Resources requested: 0/8 CPUs, 0/1 GPUs, 0.0/10.62 GiB heap, 0.0/5.31 GiB objects
Result logdir: C:\Users\<user>\ray_results\default
Number of trials: 1/1 (1 ERROR)
+-----------------------------+----------+-----------------+--------+------------------+-------+----------+----------------------+----------------------+--------------------+        
| Trial name                  | status   | loc             |   iter |   total time (s) |    ts |   reward |   episode_reward_max |   episode_reward_min |   episode_len_mean |        
|-----------------------------+----------+-----------------+--------+------------------+-------+----------+----------------------+----------------------+--------------------|        
| PPO_CartPole-v0_b2fc9_00000 | ERROR    | 127.0.0.1:28512 |      3 |          62.9944 | 12000 |    67.51 |                  200 |                   10 |              67.51 |        
+-----------------------------+----------+-----------------+--------+------------------+-------+----------+----------------------+----------------------+--------------------+        
Number of errored trials: 1
+-----------------------------+--------------+---------------------------------------------------------------------------------------------------+
| Trial name                  |   # failures | error file                                                                                        |
|-----------------------------+--------------+---------------------------------------------------------------------------------------------------|
| PPO_CartPole-v0_b2fc9_00000 |            1 | C:\Users\<user>\ray_results\default\PPO_CartPole-v0_b2fc9_00000_0_2022-06-27_16-52-02\error.txt   |
+-----------------------------+--------------+---------------------------------------------------------------------------------------------------+

== Status ==
Current time: 2022-06-27 16:53:39 (running for 00:01:36.80)
Memory usage on this node: 14.2/31.9 GiB
Using FIFO scheduling algorithm.
Resources requested: 0/8 CPUs, 0/1 GPUs, 0.0/10.62 GiB heap, 0.0/5.31 GiB objects
Result logdir: C:\Users\<user>\ray_results\default
Number of trials: 1/1 (1 ERROR)
+-----------------------------+----------+-----------------+--------+------------------+-------+----------+----------------------+----------------------+--------------------+        
| Trial name                  | status   | loc             |   iter |   total time (s) |    ts |   reward |   episode_reward_max |   episode_reward_min |   episode_len_mean |        
|-----------------------------+----------+-----------------+--------+------------------+-------+----------+----------------------+----------------------+--------------------|        
| PPO_CartPole-v0_b2fc9_00000 | ERROR    | 127.0.0.1:28512 |      3 |          62.9944 | 12000 |    67.51 |                  200 |                   10 |              67.51 |        
+-----------------------------+----------+-----------------+--------+------------------+-------+----------+----------------------+----------------------+--------------------+        
Number of errored trials: 1
+-----------------------------+--------------+---------------------------------------------------------------------------------------------------+
| Trial name                  |   # failures | error file                                                                                        |
|-----------------------------+--------------+---------------------------------------------------------------------------------------------------|
| PPO_CartPole-v0_b2fc9_00000 |            1 | C:\Users\<user>\ray_results\default\PPO_CartPole-v0_b2fc9_00000_0_2022-06-27_16-52-02\error.txt   |
+-----------------------------+--------------+---------------------------------------------------------------------------------------------------+

(pid=) [2022-06-27 16:53:40,084 E 35032 20356] (gcs_server.exe) gcs_server.cc:283: Failed to get the resource load: GrpcUnavailable: RPC Error message: failed to connect to all addre
sses; RPC Error details:
(pid=) [2022-06-27 16:53:41,094 E 35032 20356] (gcs_server.exe) gcs_server.cc:283: Failed to get the resource load: GrpcUnavailable: RPC Error message: failed to connect to all addre
    raise value
ray.exceptions.RayActorError: The actor died unexpectedly before finishing this task.
        class_name: PPOTrainer
        actor_id: 3397dd099d7dce7bab0de0f301000000
        pid: 28512
        namespace: e69a768e-6503-489a-88bb-a37de82f2791
        ip: 127.0.0.1
The actor is dead because because all references to the actor were removed.

Traceback (most recent call last):
  File "C:\Users\<user>\AppData\Local\Programs\Python\Python39\lib\runpy.py", line 197, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "C:\Users\<user>\AppData\Local\Programs\Python\Python39\lib\runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "C:\Users\<user>\PycharmProjects\my_venv\Scripts\rllib.exe\__main__.py", line 7, in <module>
  File "c:\users\<user>\pycharmprojects\my_venv\lib\site-packages\ray\rllib\scripts.py", line 41, in cli
    train.run(options, train_parser)
  File "c:\users\<user>\pycharmprojects\my_venv\lib\site-packages\ray\rllib\train.py", line 283, in run
    run_experiments(
  File "c:\users\<user>\pycharmprojects\my_venv\lib\site-packages\ray\tune\tune.py", line 838, in run_experiments
    return run(
  File "c:\users\<user>\pycharmprojects\my_venv\lib\site-packages\ray\tune\tune.py", line 741, in run
    raise TuneError("Trials did not complete", incomplete_trials)
ray.tune.error.TuneError: ('Trials did not complete', [PPO_CartPole-v0_b2fc9_00000])

The error file which path is given at the end adds nothing new.

This problem happens also when running custom scripts (which runs correctly on macOS). I also tried ray.init(include_dashboard=False) on a custom script, same output.

I run it on a dedicated virtual environment created using virtualenv. I tried uninstalling/reinstalling ray (which apparently includes Grpc), problem persists.

Windows 10, Python 3.9.0, ray 1.13.0

Is this reproducable @sven1977 ?

Update : I created a new virtual environment again with virtualenv (python 3.9.0). I only ran these commands to install the modules needed for rllib:

pip install tensorboard
pip install gym==0.15.3
pip install ray
pip install ray[rllib]
pip install dm_tree
pip install pandas

But I still get the same error while executing rllib train --run=PPO --env=CartPole-v0.
As the problem appears to come from the communication with localhost, I thought this was because my PC was connected to my company network with port filtering. But I checked the proxy/VPN settings, everything seems ok (also, I could run jupyter notebook on this PC (via localhost)).
Grpc is also mentionned in the error messages, so I tried updating/reinstalling it, but still get the same error. Version 1.43.0 is installed though.

I’m still stuck with this problem, and I don’t know how to get around it.

Found the solution here. This is actually an issue with Virtualenv on Windows (I’m using Pycharm) ; the error ‘gRPC Unavailable’ was just a consquence of the ray agent failing.

I solved my problem by selecting my system interpreter (Python 3.9) instead of creating a virtualenv when setting up a new project in Pycharm.

1 Like

Thanks for leaving an explanation and solution here!

1 Like