How severe does this issue affect your experience of using Ray?
- High: It blocks me to complete my task.
the code in my project is just as below.
summed_grad = []
gradients_1 = [worker.compute_gradients.remote(current_weights,d_steps) for worker in workers_1]
gradients_2 = [worker.compute_gradients.remote(current_weights,d_steps) for worker in workers_2]
gradients_3 = [worker.compute_gradients.remote(current_weights,d_steps) for worker in workers_3]
gradients_4 = [worker.compute_gradients.remote(current_weights,d_steps) for worker in workers_4]
gradients_5 = [worker.compute_gradients.remote(current_weights,d_steps) for worker in workers_5]
gradients_6 = [worker.compute_gradients.remote(current_weights,d_steps) for worker in workers_6]
gradients_7 = [worker.compute_gradients.remote(current_weights,d_steps) for worker in workers_7]
gradients_8 = [worker.compute_gradients.remote(current_weights,d_steps) for worker in workers_8]
gradients_9 = [worker.compute_gradients.remote(current_weights,d_steps) for worker in workers_9]
gradients_10 = [worker.compute_gradients.remote(current_weights,d_steps) for worker in workers_10]
gradients_1_sum = ray.get(gradients_1)
summed_grad.append(np.sum(gradients_1_sum,axis = 0))
del(gradients_1_sum)
gradients_2_sum = ray.get(gradients_2)
summed_grad.append(np.sum(gradients_2_sum,axis = 0))
del(gradients_2_sum)
gradients_3_sum = ray.get(gradients_3)
summed_grad.append(np.sum(gradients_3_sum,axis = 0))
del(gradients_3_sum)
but when I run my code ,an error occured as below.All the workers are on the different nodes.
(raylet) agent_manager.cc:134: The raylet exited immediately because the Ray agent failed. The raylet fate shares with the agent. This can happen because the Ray agent was unexpectedly killed or failed. See `dashboard_agent.log` for the root cause.
Traceback (most recent call last):
File "PS_mod_X_20.py", line 470, in <module>
gradients_2_sum = ray.get(gradients_2)
File "/public/home/lifei/xinzk/envs/FBGAN_for_lhl/lib/python3.6/site-packages/ray/_private/client_mode_hook.py", line 105, in wrapper
return func(*args, **kwargs)
File "/public/home/lifei/xinzk/envs/FBGAN_for_lhl/lib/python3.6/site-packages/ray/_private/worker.py", line 2269, in get
values, debugger_breakpoint = worker.get_objects(object_refs, timeout=timeout)
File "/public/home/lifei/xinzk/envs/FBGAN_for_lhl/lib/python3.6/site-packages/ray/_private/worker.py", line 670, in get_objects
object_refs, self.current_task_id, timeout_ms
File "python/ray/_raylet.pyx", line 1211, in ray._raylet.CoreWorker.get_objects
File "python/ray/_raylet.pyx", line 179, in ray._raylet.check_status
ray.exceptions.RaySystemError: System error: No such file or directory
Could you please help me to find why this error occured? Thanks a lot!