Workers per node

How severe does this issue affect your experience of using Ray?

  • High: It blocks me to complete my task.

Content from documentation: Configuring Ray — Ray 2.9.3
“Port numbers are how Ray disambiguates input and output to and from multiple workers on a single node. Each worker will take input and give output on a single port number. Thus, for example, by default, there is a maximum of 10,000 workers on each node, irrespective of number of CPUs.”

Is it possible to span upto 10000 workers?., Consider I have 32 CPU cores and in normal each worker need one core for execution, so at a time only 32 tasks can execute.

Wanted to know

  1. What is mentioned as workers here in documentation?
  2. How many tasks can I hold in queue for next executions?

Can we configure above things while starting ray cluster.? Wanted to know for what purpose ray uses these 10000 ports.

10001 in ray head node will be the listening port from driver program and then, on which port Head node will communicate with other nodes to schedule tasks. I want to know about all Inbound ports in all nodes of ray cluster.

You can set these values with min-worker-port and max-worker-port.

In this case, a worker != a core. You can assign fractional CPU resources (i.e. cpus: 0.25) and you can assign arbitrary resources (i.e. random_resource: 1). So you may have many tasks that use arbitrary and fractional resources, each task needs a worker. If you have sufficient arbitrary and fractional resources, you could run 10000 workers.

But to be honest, in my experience, I highly doubt Ray would handle this kind of situation very well. Especially if they are all trying to get and put object refs into cluster memory.