Kuberay: run jobs in specific worker groups

There is a way jobs can be submitted only for the nodes, members of the groupName definition in the yaml configuration of the kuberay cluster? For instance, if I want to submit jobs only in the worker nodes, instead of the head node. How I could accomplish this behavior in the python code - how should I change the parameters of the ray.remote? Thank you!

I don’t believe the groupName scheduling specification is easily doable today. One recommendation I’ve seen is to set the head node to have zero CPUs (ray start --num-cpus 0), so that workloads are not scheduled onto it.

cc @jjyao who is working on scheduler improvements
cc @Kai-Hsun_Chen for how to do this in kuberay

@cade Thanks for reaching out. This could be some interesting feature to add eventually. Having the ability to use different images in different worker groups with different resource allocations could be something super useful to have - split worker groups in different decoupled roles. Specially to prevent memory leak in the head node in large production systems.

Hi @jhowpd,

If you have two worker groups, one for CPU Pods and one for GPU Pods, you can specify GPU jobs on GPU Pods by Ray resource requirements (e.g. @ray.remote(num_gpus=1)).

If you have multiple worker groups for CPU Pods, it is hard for KubeRay to support this until Ray has related features.

2 Likes

Thanks @Kai-Hsun_Chen and congrats for the great work being done in kuberay. I have been testing on python 3.10+, very stable good stuff!

@jhowpd Happy to hear that!

The stability of RayCluster is OK. RayJob and RayService are in alpha versions. We will focus on stability next quarter and fix known stability issues as much as possible. If you met any questions in the future, you can post them here, slack channel (#kuberay), or open a Github issue.

1 Like

Posting here in case others run into this.

  1. Currently, you can already use different images per worker group. You need to make sure all images in one cluster have same Python and Ray version. I have done preliminary testing for this.
  2. You can assign custom resources (Resources — Ray 2.3.1) on the worker groups, which could be just labels like “custom_label_1”, “custom_label_2”, etc. Then you can request these resources in your task or actor to direct the scheduling to the desired worker group. As long as each worker group has one unique custom resource, you can deterministically schedule jobs to specific worker groups. I have not tested this, but I plan to do this soon.

I would love to hear other’s experiences if they have tried these or any other tricks to extend the ray cluster sharing in heterogenous environment.