How severe does this issue affect your experience of using Ray?
- Medium: It contributes to significant difficulty to complete my task, but I can work around it.
Hi,
I am currently using Ray in a laptop, and I would like to keep persistent data about jobs so I set up a Redis server to store GCS data. However, I started seeing messages like this when running jobs:
Job supervisor actor could not be scheduled: The actor is not schedulable: The node specified via NodeAffinitySchedulingStrategy doesn't exist any more or is infeasible, and soft=False was specified.
Inspecting the cluster data in the dashboard, I do see that previous executions leave head node records in the Cluster tab. It would seem as if a dead head node is selected because of the scheduling strategy used (as picked in ray/python/ray/dashboard/modules/job/job_manager.py at 75c1469cf03e9e4c32b3f8681223170547b1e397 · ray-project/ray · GitHub).
- Is this plausible? If so, is using the RAY_JOB_ALLOW_DRIVER_ON_WORKER_NODES_ENV_VAR to 1 a correct way to address this?
- Is there any way of setting a node’s identity, signaling that “this current” head node is the same node as run previously?
Thanks a lot in advance,
Javier