Is dashboard/agent.py supposed to be at 100% CPU?

How severe does this issue affect your experience of using Ray?

  • High: It blocks me to complete my task.

I’m using ray.remote() to launch a number of parallel tasks, and finding that the rate of task completion is unexpectedly bottlenecked. I noticed that dashboard/agent.py is constantly at or near 100% CPU. Does that indicate some kind of Ray internals are the limiting factor? If so, is there a way to eliminate unnecessary work internal to Ray? For example, I don’t need the dashboard functionality and just need maximum throughput. Any suggestions welcome, thanks!

I think currently Ray always starts the dashboard. Dashboard should not take 100% CPU, something weird may be happening. Btw, how large scale is your job?

What is a good way to dig into Ray to investigate?

In our job, about 500-1000 tasks are running at any given time, however they are mostly blocked on external service requests so very few are actually executing on the ray host machine at any given time.

Hi @0939013, if you can make a easy reproduction, I can take a look at the issue. Thanks!