Ray k8s cluster, cannot run new task when previous task failed

architkulkarni · June 29, 2022, 6:12pm

@GoingMyWay Thanks for the details–I think I have a better idea now. Can you try ray.init(address=<your address>, runtime_env={"env_vars": {"PYTHONPATH": "home/me/app/myproject/src/"}})?

My guess is that Ray Python processes are running in different directories depending on whether they’re started on the head node or on worker nodes. I suspect if you print sys.path in a Ray task, I think it will show different directories depending on whether it’s started on the head node or the worker node. By setting the environment variable above, we guarantee that "home/me/app/myproject/src/" will be appended to sys.path in every Ray task/actor, so Python will search the src directory for imports.

(In general, the recommended path is to use the runtime_env "working_dir" option, since that handles both syncing files to the cluster node and setting the cwd and PYTHONPATH for you. But since you already have the files synced to every node, you can just use the "env_vars" approach above.)

Topic		Replies	Views
Ray cluster crashes as soon as i add a worker Ray Clusters	1	16	August 26, 2024
ModuleNotFound error after ray.init() Ray Clusters	0	190	February 21, 2024
ModuleNotFoundError for ray.autoscaler._private._kubernetes Kubernetes	0	469	June 22, 2023
Ray_xgboost on K8 Kubernetes	2	474	January 9, 2024
Failure to serialize response Ray Clusters	2	1781	April 28, 2022

Ray k8s cluster, cannot run new task when previous task failed

Related topics