Worker nodes not utilized

How severe does this issue affect your experience of using Ray?

  • Medium: It contributes to significant difficulty to complete my task, but I can work around it.

After following the 60seconds RLib guide I am unable to see any worker nodes being utilized on the Kubernetes Ray cluster. The only node that is utilized is the head node. However, if I shift the config to num_workers=0 and remote_worker_envs=True, all of the CPUs fire up but this doesn’t seem like what I want (?). My multi-node cluster consists of 5 nodes. I am hoping to maximize one of the examples for testing purposes on the cluster and see how long it takes to train. Am I missing something?

If I manually connect to Ray and run the object transfer example this also works so I think this is some sort of RLlib config I am unaware of.

Hi @cloudhaxor,
Have you connected the other nodes to your ray head node? Starting ray inside a cluster does not let ray know what nodes are there to be utilized. Have you already followed the Kubernetes guide?