Ray k8s cluster, cannot run new task when previous task failed

@GoingMyWay Thanks for the details–I think I have a better idea now. Can you try ray.init(address=<your address>, runtime_env={"env_vars": {"PYTHONPATH": "home/me/app/myproject/src/"}})?

My guess is that Ray Python processes are running in different directories depending on whether they’re started on the head node or on worker nodes. I suspect if you print sys.path in a Ray task, I think it will show different directories depending on whether it’s started on the head node or the worker node. By setting the environment variable above, we guarantee that "home/me/app/myproject/src/" will be appended to sys.path in every Ray task/actor, so Python will search the src directory for imports.

(In general, the recommended path is to use the runtime_env "working_dir" option, since that handles both syncing files to the cluster node and setting the cwd and PYTHONPATH for you. But since you already have the files synced to every node, you can just use the "env_vars" approach above.)