We try to run machine learning training task on ray cluster. But our head pod is not GPU, and worker pods have 8 GPUs.
How to prevent head pod from computing?
We try to run machine learning training task on ray cluster. But our head pod is not GPU, and worker pods have 8 GPUs.
How to prevent head pod from computing?
Do you want to check this Resources (CPUs, GPUs) — Ray 0.6.3 documentation
Thanks your reply.
Is there other ways to set head noSchedule? For example, by default, kubernetes master node is NoSchedule.
I think @Dmitri may have more context to answer here?
The method suggested above was to decorate remote tasks with the gpu requirement:
@ray.remote(num_gpus=1)
Another method is to declare the head node as having 0 CPU. Ray tasks implicitly assume 1 CPU so no Ray tasks will run on the head node [unless you explicitly declare the task as requiring num_cpus=0].
To declare that the head pod has 0 CPU, you add --num-cpus = 0 to the head’s Ray start command. Alternatively, if you are using the Ray Kubernetes Operator, you can {“CPU”:0} to head podType
’s rayResources
field.
See also the discussion here:
Cool, thank you for your detailed response.