Why can't I get the object with ray.get()?

How severe does this issue affect your experience of using Ray?

  • High: It blocks me to complete my task.

the code in my project is just as below.

    summed_grad = []
    gradients_1 = [worker.compute_gradients.remote(current_weights,d_steps) for worker in workers_1]
    gradients_2 = [worker.compute_gradients.remote(current_weights,d_steps) for worker in workers_2]
    gradients_3 = [worker.compute_gradients.remote(current_weights,d_steps) for worker in workers_3]
    gradients_4 = [worker.compute_gradients.remote(current_weights,d_steps) for worker in workers_4]
    gradients_5 = [worker.compute_gradients.remote(current_weights,d_steps) for worker in workers_5]
    gradients_6 = [worker.compute_gradients.remote(current_weights,d_steps) for worker in workers_6]
    gradients_7 = [worker.compute_gradients.remote(current_weights,d_steps) for worker in workers_7]
    gradients_8 = [worker.compute_gradients.remote(current_weights,d_steps) for worker in workers_8]
    gradients_9 = [worker.compute_gradients.remote(current_weights,d_steps) for worker in workers_9]
    gradients_10 = [worker.compute_gradients.remote(current_weights,d_steps) for worker in workers_10]
    gradients_1_sum = ray.get(gradients_1)
    summed_grad.append(np.sum(gradients_1_sum,axis = 0))
    del(gradients_1_sum)
    gradients_2_sum = ray.get(gradients_2)
    summed_grad.append(np.sum(gradients_2_sum,axis = 0))
    del(gradients_2_sum)
    gradients_3_sum = ray.get(gradients_3)
    summed_grad.append(np.sum(gradients_3_sum,axis = 0))
    del(gradients_3_sum)

but when I run my code ,an error occured as below.All the workers are on the different nodes.

(raylet) agent_manager.cc:134: The raylet exited immediately because the Ray agent failed. The raylet fate shares with the agent. This can happen because the Ray agent was unexpectedly killed or failed. See `dashboard_agent.log` for the root cause.
Traceback (most recent call last):
  File "PS_mod_X_20.py", line 470, in <module>
    gradients_2_sum = ray.get(gradients_2)
  File "/public/home/lifei/xinzk/envs/FBGAN_for_lhl/lib/python3.6/site-packages/ray/_private/client_mode_hook.py", line 105, in wrapper
    return func(*args, **kwargs)
  File "/public/home/lifei/xinzk/envs/FBGAN_for_lhl/lib/python3.6/site-packages/ray/_private/worker.py", line 2269, in get
    values, debugger_breakpoint = worker.get_objects(object_refs, timeout=timeout)
  File "/public/home/lifei/xinzk/envs/FBGAN_for_lhl/lib/python3.6/site-packages/ray/_private/worker.py", line 670, in get_objects
    object_refs, self.current_task_id, timeout_ms
  File "python/ray/_raylet.pyx", line 1211, in ray._raylet.CoreWorker.get_objects
  File "python/ray/_raylet.pyx", line 179, in ray._raylet.check_status
ray.exceptions.RaySystemError: System error: No such file or directory

Could you please help me to find why this error occured? Thanks a lot!

what’s your ray & grpcio version?

Hi ,I installed ray with version2.0.0 but I didn’t install grpcio.

grpcio is the required dep for ray. You can check it using pip freeze | grep grpcio

Thanks! I checked it again and found the grpcio version is 1.43.0.

can you try on 1.49.1 and see if the error still occurs?

ok, I will try it and give feedback to you soon. Thanks!

1 Like

Hi,I installed grpcio with version1.48.2 because it’s the latest version I can install. I found the above error disappeared. Thanks very much! But unfortunately, another error occurred. I would be very grateful if you could tell me the possible reason.

absl::ToInt64Seconds(absl::Now() - gcs_last_alive_time_) < ::RayConfig::instance().gcs_rpc_server_reconnect_timeout_s() Failed to connect to GCS within 60 seconds
*** StackTrace Information ***
/public/home/lifei/xinzk/envs/FBGAN_for_lhl/lib/python3.6/site-packages/ray/_raylet.so(+0xc5d7ba) [0x2b46447ca7ba] ray::operator<<()
/public/home/lifei/xinzk/envs/FBGAN_for_lhl/lib/python3.6/site-packages/ray/_raylet.so(+0xc5f2c2) [0x2b46447cc2c2] ray::SpdLogMessage::Flush()
/public/home/lifei/xinzk/envs/FBGAN_for_lhl/lib/python3.6/site-packages/ray/_raylet.so(_ZN3ray6RayLogD1Ev+0x37) [0x2b46447cc5d7] ray::RayLog::~RayLog()
/public/home/lifei/xinzk/envs/FBGAN_for_lhl/lib/python3.6/site-packages/ray/_raylet.so(+0x71290d) [0x2b464427f90d] ray::rpc::GcsRpcClient::CheckChannelStatus()
/public/home/lifei/xinzk/envs/FBGAN_for_lhl/lib/python3.6/site-packages/ray/_raylet.so(_ZN5boost4asio6detail12wait_handlerIZN3ray3rpc12GcsRpcClient15SetupCheckTimerEvEUlNS_6system10error_codeEE_NS0_9execution12any_executorIJNS9_12context_as_tIRNS0_17execution_contextEEENS9_6detail8blocking7never_tILi0EEENS9_11prefer_onlyINSG_10possibly_tILi0EEEEENSJ_INSF_16outstanding_work9tracked_tILi0EEEEENSJ_INSN_11untracked_tILi0EEEEENSJ_INSF_12relationship6fork_tILi0EEEEENSJ_INSU_14continuation_tILi0EEEEEEEEE11do_completeEPvPNS1_19scheduler_operationERKS7_m+0x303) [0x2b464427fdb3] boost::asio::detail::wait_handler<>::do_complete()
/public/home/lifei/xinzk/envs/FBGAN_for_lhl/lib/python3.6/site-packages/ray/_raylet.so(+0xc6e1fb) [0x2b46447db1fb] boost::asio::detail::scheduler::do_run_one()
/public/home/lifei/xinzk/envs/FBGAN_for_lhl/lib/python3.6/site-packages/ray/_raylet.so(+0xc6f431) [0x2b46447dc431] boost::asio::detail::scheduler::run()
/public/home/lifei/xinzk/envs/FBGAN_for_lhl/lib/python3.6/site-packages/ray/_raylet.so(+0xc6f6a0) [0x2b46447dc6a0] boost::asio::io_context::run()
/public/home/lifei/xinzk/envs/FBGAN_for_lhl/lib/python3.6/site-packages/ray/_raylet.so(_ZN3ray4core10CoreWorker12RunIOServiceEv+0xcd) [0x2b464416d51d] ray::core::CoreWorker::RunIOService()
/public/home/lifei/xinzk/envs/FBGAN_for_lhl/lib/python3.6/site-packages/ray/_raylet.so(+0xd9b350) [0x2b4644908350] execute_native_thread_routine
/lib64/libpthread.so.0(+0x7dd5) [0x2b3ffea32dd5] start_thread
/lib64/libc.so.6(clone+0x6d) [0x2b3ffed44ead] clone

This means the GCS server (Ray v2 Architecture - Google Docs) couldn’t be started. I think you are deploying Ray in an unconventional environment? can you tell me a bit more details about your setup? Are you using virtualenv or ARM based architecture?

Also, it’d be great if you can provide the log from gcs_server.out Logging — Ray 3.0.0.dev0