Hi Im just about to deploy my ray application on k8s, here’s some myth stuck in my mind.
1.Is it true that, to deploy a ray app, I would need to deploy one service and one deployment for the head node, and one deployment for the woker nodes?
2. if so, what is the difference between service head node and deployment head node?
which one should I give more resources to so that my app would be more performant?
3. if the deployment head node is the more important one, can I have more than one pods for the head node? should I?
4. if I am about to have 100 actors with one cpu per actor, is it better to have 100 pods with 1cpu each, or 10 pods with 10 cpus each? how about the head node, would a head node with 2 cpu be capable of handling so many woker actors?
I have been searching answers for days but sadly could find much information on this, any help would be greatly appreciated!
The preferred way to launch Ray on K8s is using the KubeRay operator which operates Ray clusters as “custom deployments” consisting of Ray pods. (Each K8s pod gets 1 Ray node.) https://ray-project.github.io/kuberay/
3.
Each Ray cluster has exactly 1 head pod, which fulfills central control function. You can have as many Ray workers as needed for the workload.
4.
Fewer big pods is generally better! Ray pods should be scaled to take up entire K8s nodes if possible.
I have been searching answers for days but sadly could find much information on this, any help would be greatly appreciated!
We’re working on fleshing out the docs! Do let us know if you any other questions.
Thanks for the quick reply, really appreciate that. and its a little awkward to say but I do have some follow-up questions.
WRT 1:
kuberay looks great, but is it an overkill if I don’t need the autoscaling feature of ray? currently I am planning to deploy a so called static ray cluster, where there are fixed numbers of pods within the cluster, is kuberay still necessary in this case:
WRT 3:
thanks for the clear answer, I actually opened an issue on this a few days ago, [Ray Core] Lack of documentation on ray head node · Issue #25958 · ray-project/ray · GitHub, a gentleman said that ‘the head node will creates multiple processes, … and utilize all those resources’, but I wasn’t certain about if it could be multiple pods. I am not too sure about how big should the head node be, is there a best practice on the ratio between head node cpu and ators(with 1 cpu per actor)?
something like: 1 cpu head node with 100 actors, 2 cpus head node with 200 cpu actors?
WRT4:
by “k8s nodes” did you mean something like ec2 instances? so if my k8s clusters is made up by
a few c5 4x large EC2 instance, then each pod should have cpus and memory of a single c5 4x large?
KubeRay can simplify configuration vs. directly setting up K8s deployments.
The docs are indeed vague on the issue of optimal resource allocation. The overall recommendation is experimentation – run some benchmarks to make sure you workload is using resources effectively. The head Ray pod is one pod running many processes (scheduling and control components, as well as potentially Ray workloads).
Yes, typically a Kubernetes Node corresponds to a cloud VM. Each pod should have CPUs and memory of the entire instance, leaving a bit of room for system daemons and other processes that run when the K8s node is not occupied by any user workloads.