How severe does this issue affect your experience of using Ray?
- High: It blocks me to complete my task.
I’m using ray.remote() to launch a number of parallel tasks, and finding that the rate of task completion is unexpectedly bottlenecked. I noticed that dashboard/agent.py is constantly at or near 100% CPU. Does that indicate some kind of Ray internals are the limiting factor? If so, is there a way to eliminate unnecessary work internal to Ray? For example, I don’t need the dashboard functionality and just need maximum throughput. Any suggestions welcome, thanks!
I think currently Ray always starts the dashboard. Dashboard should not take 100% CPU, something weird may be happening. Btw, how large scale is your job?
What is a good way to dig into Ray to investigate?
In our job, about 500-1000 tasks are running at any given time, however they are mostly blocked on external service requests so very few are actually executing on the ray host machine at any given time.
Hi @0939013, if you can make a easy reproduction, I can take a look at the issue. Thanks!
Ima necro this post exactly a year later, but its the only resource I can find…
Here with the same issue, but also without minimal reproducer sadly.
OP have you resolved the issue eventually?
I bring pictures!
I have a bunch or ray remote functions running as well, but they seam to run very slow, or at least not really using any resources. Meanwhile 1 physical core is at 100% all the time, and btop tells me its the ray dashboard agent.
Could it be that it is blocking the rest?
why would that happen?
The dashboard is not switched on btw…