How to prevent scheduling non-GPU tasks to GPU nodes

How severe does this issue affect your experience of using Ray?

  • Low: It annoys or frustrates me for a moment.

Context: I have some tasks that require a GPU node, which is started on demand using kuberay. Once that task finishes, another task which doesn’t require a GPU starts and it’s scheduled to the currently active GPU node. Ideally, I would like this task to spin-up a high-memory non-GPU node, allowing the GPU node to be deallocated. However, Ray prefers to schedule the task to the already available GPU node before it spins down.

Is there a way to define an affinity for task that prevent them to be scheduled to GPU nodes? I know it’s possible to define the specific node_id for a task. But that is not what I need, since the node_id is not known a priori, as the nodes scale up and down automatically.

Why: The main reason to do this is saving compute resources/money.

@eduardoarnold see Placement Groups: Placement Groups — Ray 2.31.0

@Sam_Chan thanks and sorry for the delay in replying!

I’ve read through the Placement Groups documentation but it still isn’t clear to me how to create a placement group that enforces that tasks are scheduled to non-GPU nodes.

For example, if there is a physical machine with 8 cores and 1 GPU, and I request a PG with {"cpu" : 8, "gpu": 0}, what prevents this from being allocated to this GPU VM?

I believe that one solution is to create a virtual resource that is only available on non-GPU machines, and then use this as a requirement (like suggested in Resources). But that does require me to change the cluster configuration, which is not ideal.

Please let me know if I am missing something, thanks!

I just realized that creating a custom resource, e.g. no_gpu: 1 to all non-gpu nodes would not work if those nodes are not already active. In my case, I keep the min_replicas of the node pool at 0, so I don’t see how Ray would know to spin-up this pool to use to get access to the custom resource.

Oh I see, in that case it’d get placed onto GPU machine which is not what you want. Can you submit a feature request on Github to support this; this is an interesting use case.

Is there a reason you can’t use a separate cluster to get you that isolation between CPU and GPU resources?

Sorry for the late reply again, for some reason I’m not getting these notifications.

Thanks, I will create a feature request on Github.

Thanks for your suggestion. Although I could create a new cluster as you proposed, this would incur in waste of resources, as I’d need a new control plane just to have such CPU/GPU nodes isolation.

For completeness, I have created the Github issue here: [Core] Prevent schedulling non-GPU tasks to GPU nodes · Issue #47866 · ray-project/ray · GitHub

1 Like