Ports for workers

How severe does this issue affect your experience of using Ray?

  • High: It blocks me to complete my task.

Consider my system has 32CPU and so it could run 32workers at a time. So is it fine to open only 32ports for workers?

Why ports configuration in ray documentation mention to assign 10000ports for workers.? Did each worker will get mapped to one port when schedule?. So If I span 1000 tasks then 1000ports will be occupied for 1000 workers and these 1000workers will execute based on core availability. Correct me here if my understanding is wrong.

TIA, :slightly_smiling_face:
@jjyao @Jules_Damji

Yea, each worker uses one port. You may have more workers than CPUs since some workers might be blocked or idle.

Hi @jjyao ,

Consider this scenario, I have opened 100 ports for workers. Here I’m starting 120 tasks, 100 tasks will occupy 100 worker ports, where the next 20 tasks will get mapped?

Is there any queue or pool mechanism behind this?. If yes how much tasks can I persist in that pool/queue? What factor plays here.?

I saw similar question here where @sangcho replied like more workers are needed to complete current running tasks and subsequent tasks will schedule as jobs, it will execute when free ports and resources are available. Consider I’m running 100 tasks and each task depends on another new worker task, here 200tasks have to run. In this case 200 workers enough right?

Kindly give clarifications in these area., It’s too blurry and couldn’t find any in documentation.

@Jules_Damji @architkulkarni @Dmitri @sangcho

You got it - Ray will automatically queue and schedule those additional tasks as worker ports free up.

For your second question, if you need all 200 tasks to execute in parallel yes you would need 200 workers.

However it depends on the timing of your first 100 tasks; if they are serially dependent than you’ll have an inefficient/underutilized Cluster as the 100 tasks are finishing their logic before spawning their child Tasks.

Hi @Sam_Chan, Thanks for your response.

Can you explain your queue mechanism behind this?., I’m in the place to open only necessary ports in production environment. Security team is not allowing me to open 10000 ports as mentioned in ray documentation.

My cluster setup has 128Cores in total, so is it enough to open only 128 ports if my tasks are not dependent on each other. Also I want to know how many tasks can queue hold?

@Dmitri @architkulkarni

My cluster setup has 128Cores in total, so is it enough to open only 128 ports if my tasks are not dependent on each other.

Yea that should be enough assuming each of your task uses 1 CPU. When there is no available resources (e.g. CPU), tasks will be queued until a previous task finishes and frees up the CPU. The queue is unbounded.