K8s nodes running empty


I’m using the k8s cluster launcher on AWS for a tune job with 36-core instances. The workers are sized as:

        cpu: 4000m
        memory: 2Gi
        memory: 3Gi

The Tune job is using random search and requests resources_per_trial={"cpu": 4}.

However, on the ray dashboard I see that there is no more than 1 worker actually running per node. In the example below there is one running worker, while 2 are idle with plenty of available resources.

Am I missing some configuration? Thanks in advance

Hey, the IDLE workers are misleading - they’re used as part of a worker pool used by ray tasks (but not ray actors). You should be able to ignore those.

Thanks for the reply. In this example, if the node has 36 cores and each worker takes 4, I would expect to see close to 9 workers all running at 400% each in that node, correct? Each node in the cluster behaves the same— they never fill up and never run more than one worker at full capacity.

Besides the misleading IDLE workers, I would like to see better use of the available resources

Ray is going to assign each “worker node” 4 CPUs. Note that “ray node” == “kube pod”. So you’ll only see 1 worker at 400% at the node.