Couldn't run in remote cluster

Getting below error trace when trying to execute a task in remote cluster, Could able to execute the same code in single ray node. I have copied all src and dependencies in head and worker nodes.

File "/home/xxx/lib/python3.9/site-packages/ray/remote_function.py", line 129, in _remote_proxy
    return self._remote(args=args, kwargs=kwargs, **self._default_options)
  File "/home/xxx/lib/python3.9/site-packages/ray/util/tracing/tracing_helper.py", line 307, in _invocation_remote_span
    return method(self, args, kwargs, *_args, **_kwargs)
  File "/home/xxx/lib/python3.9/site-packages/ray/remote_function.py", line 247, in _remote
    return client_mode_convert_function(self, args, kwargs, **task_options)
  File "/home/xxx/lib/python3.9/site-packages/ray/_private/client_mode_hook.py", line 178, in client_mode_convert_function
    return client_func._remote(in_args, in_kwargs, **kwargs)
  File "/home/xxx/lib/python3.9/site-packages/ray/util/client/common.py", line 298, in _remote
    return self.options(**option_args).remote(*args, **kwargs)
  File "/home/xxx/lib/python3.9/site-packages/ray/util/client/common.py", line 581, in remote
    return return_refs(ray.call_remote(self, *args, **kwargs))
  File "/home/xxx/lib/python3.9/site-packages/ray/util/client/api.py", line 100, in call_remote
    return self.worker.call_remote(instance, *args, **kwargs)
  File "/home/xxx/lib/python3.9/site-packages/ray/util/client/worker.py", line 556, in call_remote
    task = instance._prepare_client_task()
  File "/home/xxx/lib/python3.9/site-packages/ray/util/client/common.py", line 587, in _prepare_client_task
    task = self._remote_stub._prepare_client_task()
  File "/home/xxx/lib/python3.9/site-packages/ray/util/client/common.py", line 324, in _prepare_client_task
    self._ensure_ref()
  File "/home/xxx/lib/python3.9/site-packages/ray/util/client/common.py", line 319, in _ensure_ref
    self._ref = ray.worker._put_pickled(
  File "/home/xxx/lib/python3.9/site-packages/ray/util/client/worker.py", line 510, in _put_pickled
    raise cloudpickle.loads(resp.error)
ModuleNotFoundError: No module named 'src'

Trace from ray_client_server_23014.err

2023-07-10 20:23:39,036 INFO server.py:884 -- Starting Ray Client server on 0.0.0.0:23014
2023-07-10 20:23:39,904 INFO logservicer.py:103 -- New logs connection established. Total clients: 1
2023-07-10 20:23:39,906 INFO worker.py:1364 -- Connecting to existing Ray cluster at address: xx.xx.xx.xx:6379...
2023-07-10 20:23:39,913 INFO worker.py:1544 -- Connected to Ray cluster. View the dashboard at ^[[1m^[[32m xx.xx.xxx.xx:8265 ^[[39m^[[22m
2023-07-10 20:25:02,463 ERROR server.py:545 -- Put failed:
Traceback (most recent call last):
File "/home/xxx/site-packages/ray/util/client/server/server.py", line 536, in _put_object
obj = loads_from_client(data, self)
File "/home/xxx/site-packages/ray/util/client/server/server_pickler.py", line 129, in loads_from_client
return ClientUnpickler(
ModuleNotFoundError: No module named 'src'
2023-07-10 20:25:37,332 INFO server.py:929 -- 25 idle checks before shutdown.
2023-07-10 20:25:42,342 INFO server.py:929 -- 20 idle checks before shutdown.
2023-07-10 20:25:47,352 INFO server.py:929 -- 15 idle checks before shutdown.
2023-07-10 20:25:52,362 INFO server.py:929 -- 10 idle checks before shutdown.
2023-07-10 20:25:57,372 INFO server.py:929 -- 5 idle checks before shutdown.

python : 3.9.14
ray : 2.3.1

Kindly help me to proceed further in this.

@BalajiSelvaraj10 Can you provide the task code you trying to execute, including all the import statements, and when you say you have “copied all src and dependencies in head and worker nodes”, how do you mean?

Are you using Runhttps://docs.ray.io/en/latest/ray-core/api/doc/ray.runtime_env.RuntimeEnv.html?highlight=RuntimeenvtimeEnv?

Hi @Jules_Damji , Thanks for the prompt response. :slightly_smiling_face:
I resolved this case, I missed out to mention new path in python path. Sorry for raising here.

I have one more doubt, Is there any limit to call ray.init() to connect with existing cluster.? Consider I have remote cluster and my use case is to call ray.init() for each api call. I couldn’t find any limitations in documentation.

There is not limit, but if you call ray.init() again to the same cluster than it, fail with error saying ray.iinit is already initalized. There is no need to call ray.initI() multiple times. Call only once in your dirver script.