There is currently no built-in Ray feature to globally restrict scheduling of non-GPU tasks to only CPU nodes, or to make GPU nodes “opt-in” for only GPU-requiring tasks, without annotating every @ray.remote call or using custom resources/labels. The Ray scheduler prefers to use available nodes, so CPU-only tasks may be scheduled on GPU nodes if those nodes have available CPUs, unless you explicitly prevent it via resource requirements or by setting num_cpus=0 on GPU nodes (which, as you noted, is brittle and can break code expecting CPUs) [Ray GitHub Issue #47866] [Ray Discuss: How to prevent scheduling non-GPU tasks to GPU nodes].
Best practices today are: (1) use custom resources or node labels to tag CPU nodes and require those resources for CPU tasks, but this does require updating all relevant @ray.remote calls; (2) set num_cpus=0 on GPU nodes, but then you must use a custom resource for all tasks, which is not ideal; (3) use placement groups or node affinity for fine-grained control, but this also requires code changes. There is ongoing discussion and feature requests for a more Kubernetes-like taints/tolerations or “opt-in” scheduling model, but it is not yet implemented [Ray GitHub Issue #47866] [Ray Discuss: Heterogeneous clusters]. Would you like more detail or code examples for any of these approaches?
Sources:
- Ray GitHub Issue #47866
- Ray Discuss: How to prevent scheduling non-GPU tasks to GPU nodes
- Ray Discuss: Heterogeneous clusters
Hint: Mention @RunLLM in the post for followups.