Some questions about Ray on Kubernetes

Hello, I just wanted to clarify a few things about the usage of Ray clusters on K8s.

  1. If there’s only one worker type, would you recommend having multiple worker pods on the same K8s node, or is it preferable to set it up in such a way that only one worker pod gets scheduled on each K8s node? For example, is it better to have 2 pods with 7GB and 2vCPU each on a 15GB, 4vCPU machine; or 1 pod with 15GB, 4vCPUs on the same machine type.
  2. How many Ray clusters is the Ray operator able to manage? If we use a cluster-scoped Ray operator and deploy Ray in n namespaces, at what value of n (roughly – 10, 50, 100, etc) would the operator start facing issues? Assuming each Ray Cluster is actively in use and can scale from 1-50 pods each. And is the scaling concurrent when it deals with so many ray clusters at once?
  3. This is more of a general Ray cluster question, how many resources should the head node be assigned if we ensure that no user task gets scheduled on it by making rayResources zero? Or in other words, what could cause heavy memory or CPU usage in the head node? Heavy scaling activity? Lots of data stored in the object store? The concern here is that the head node resource usage might shoot up when more worker pods get added to the Ray cluster. So a resource allocation for the head node that works during testing might break in production when dealing with a high number of worker pods.
  4. Considering that losing the Ray head node will cause the entire Ray cluster to restart, it sounds to me that the head alone might be better off being scheduled on an on-demand node (not spot). This is probably a huge stretch, but is there any possibility of setting up a fault-tolerant head node? If not, is something like that in Ray’s roadmap? (Leader election, etc)

@Dmitri @Alex could you help out here?

1 Like
  1. It’s better to have one Ray pod per node.
  2. I’d actually recommend running namespace-scoped operators with at most 10 Ray clusters per namespace.
  3. You should annotate the head node as being a 0 CPU Ray node for large Ray clusters.
  4. I believe having an HA head node is on the roadmap for Ray, @Alex knows more about that.
1 Like

Totally agree with everything Dmitri said, and will also point you to the deployment guide for more details: Ray Deployment Guide — Ray v1.8.0

There’s work being done to make the head node fault tolerant and to improve fault tolerance in general. For now, if your’e looking for fault tolerance, I’d recommend doing it at the application/library level. For example, Serve supports fault tolerant deployments and Tune and Ray Train support checkpointing based fault tolerance.

1 Like