- None: Just asking a question out of curiosity
I am running a ray remote task with num_cpus=1 and I notice in the dashboard that many tasks have CPU Usage > 100%, sometimes above 300%
I would like to understand why is that and
- Is it okay or any downside of it exceeding 100% and,
- What could be done to keep it at or below 100% (It doesn’t matter if its not a bad thing though)
@ckapoor can you include a script that sends the CPU into the stratosphere?
Also, include the snapshot of the CPU utilization on the dashboard.
@rickyyx @sangcho This should be curiously intriguing if we can reproduce it.
@ckapoor today Ray resources are logical. This means it doesn’t prevent a task requiring 1 logical cpu to actually over use more physical cpus; and it’s totally OK to use more than claimed from Ray’s point of view.
At the moment Ray doesn’t provide ways to limit task usage. For more information on Ray’s logical resources you can refer to Resources — Ray 2.3.0
Hello @Jules_Damji ,
Unfortunately, I cannot attach the script or share code but essentially its a python script that uses pandas and numpy operations on the data.
I have attached the picture of the task CPU%.
fyi - The solution to the above issue when using Ubuntu machines is to set the ENV variable OMP_NUM_THREADS = 1
I have attached the screen shot for a similar job.
However, Ray team can continue investigating if it helps.
I have a followup question though primarily because of lack of my understanding.
The underling EC2 machine shows 90% CPU utilization for hours but the above dashboard shows 99%.
What is the dashboard metric actually capturing?
@rickyyx @sangcho Why the discrepancy from what the dashboard is showing. and what the actual EC2 machine is showing. @ckapoor for the EC2, you using ubuntu utils like htop or top to see the load?
As far as I know, and the @rickyyx can correct me, we get cpu stats from /proc/stat programmatically
with respect to the env OMP_NUM_THREADS, we document the behavior here
@Jules_Damji I am using a much simpler EC2 console.
Please see the attached pic:
Hi, @ckapoor the per process’s CPU unit is a “single CPU”, meaning if you use more than 1 CPU (e.g., multi threading using OMP_NUM_THREADS or some native code using multi threading under the hood), it can exceed 100%. This means you are using more than 1 CPU from a single worker (single python program).