Hi, I’m new to Ray and trying to parallelize my calc by a cluster, but I encountered ‘ModuleNotFoundError’ from some of my remote calls and can’t get a clue what actually happened.
- Environment:
** I have a cluster of 4 nodes, one for the head. The head node is started by ‘ray start --head --gcs-server-port=40678 --port=9736’ and worker nodes are started by 'ray start --address=‘xxxx:9736’ --redis-password=‘xxxxx’
** After starting the head and all workers, I’m able to see them from the dashboard (and I assume the cluster is working fine) - In my calculation script, I use ray.init(–address=‘xxxx’ --redis-password=‘5241590000000000’) to connect to the cluster and launch about 100 tasks.
- I run my calculation from the head node, e.g. ‘python test.py’
- Errors under different scenarios:
** In previous set up, say I have more than 1 worker nodes, tasks scheduled to worker nodes fail on ‘ModuleNotFoundError, no module named ‘my-own-package’’.
** If I stop all other works, only keep the head, I’m able to finish my calc and using all resources available at the head node.
Error message is like below:
2021-05-19 14:46:18,715 ERROR worker.py:1056 – Possible unhandled error from worker: ray::price_cash_flows_batch() (pid=96919, ip=10.23.186.153)
File “python/ray/_raylet.pyx”, line 458, in ray._raylet.execute_task
File “python/ray/_raylet.pyx”, line 479, in ray._raylet.execute_task
File “python/ray/_raylet.pyx”, line 349, in ray._raylet.raise_if_dependency_failed
ray.exceptions.RaySystemError: System error: No module named ‘bct’
traceback: Traceback (most recent call last):
File “/root/anaconda3/envs/risk-engine/lib/python3.8/site-packages/ray/serialization.py”, line 246, in deserialize_objects
obj = self._deserialize_object(data, metadata, object_ref)
File “/root/anaconda3/envs/risk-engine/lib/python3.8/site-packages/ray/serialization.py”, line 188, in _deserialize_object
return self._deserialize_msgpack_data(data, metadata_fields)
File “/root/anaconda3/envs/risk-engine/lib/python3.8/site-packages/ray/serialization.py”, line 166, in _deserialize_msgpack_data
python_objects = self._deserialize_pickle5_data(pickle5_data)
File “/root/anaconda3/envs/risk-engine/lib/python3.8/site-packages/ray/serialization.py”, line 156, in _deserialize_pickle5_data
obj = pickle.loads(in_band)
ModuleNotFoundError: No module named ‘bct’
It seems to me the python path is not correctly set from the remote, but I have no idea what’s going wrong and how to fix it since I’m not using Ray Cluster.
Any idea?
Thanks,
-BS