Incorrect resource identification

asm582 · May 17, 2021, 9:53pm

Hello,

I am starting ray head node but I do not want the head node to see GPU, so I start ray head node as:

ray start --head --num-cpus 1 --num-gpus 0

I still see that head node comes up with GPU detected on the node:

1 node(s) with resources: {‘node:10.187.57.141’: 1.0, ‘memory’: 354943174042.0, ‘object_store_memory’: 156404217446.0, ‘CPU’: 1.0, ‘accelerator_type:V100’: 1.0}

is there a way where we can avoid ray to auto detect resources?

sangcho · May 17, 2021, 10:00pm

Are you using the autoscaler for this? I think you are seeing this issue; Autoscaler does not respect --num-cpus argument to `ray start` · Issue #13270 · ray-project/ray · GitHub

cc @Ameer_Haj_Ali

asm582 · May 17, 2021, 10:03pm

The setup that I have is: I start head node and ask 3 worker nodes to connect to head node.
I am not sure in my current setup if autoscaler is used… I see that it is auto detecting GPU when I do not want head node to detect GPU, I do see CPU set to 1 in my setup

sangcho · May 18, 2021, 4:09am

@Dmitri any guess why this happens? Can you help diagnosing his issue?

Dmitri · May 18, 2021, 4:33am

the “GPU” resource was correctly overwritten to 0, but the “accelerator type” resource which is meant to aid in the scheduling of particular tasks onto machines with specific types of nvidia gpus was not removed.

This is a bug. The behavior is confusing, but harmless.
Will file an issue tomorrow.

Also, there is an API to override resource autodetection — will look that up and get back to you tomorrow.

Ameer_Haj_Ali · May 18, 2021, 8:30am

I think this is a good first issue for @mwtian to work on as well.

Dmitri · May 18, 2021, 1:53pm

Accelerator annotation issue tracked here: [core] Zero-gpu node shouldn't be marked with accelerator_type resource. · Issue #15878 · ray-project/ray · GitHub
Thanks for pointing this out!

There’s no way to turn off resource detection.
However, you can use the command line arguments memory, num_cpus, num_gpus, and resources to override autodetected resources:
https://docs.ray.io/en/master/package-ref.html#the-ray-command-line-api

Topic		Replies	Views
Intentionally not using GPU Ray Core	3	419	February 9, 2022
Multiple GPU head node on GCP Ray Clusters	3	604	April 25, 2022
Autoscaler does not scale in ray1.4 with 0 CPUs allocated head node Kubernetes	1	482	July 27, 2021
Issues in ray.init() functionality	1	469	December 21, 2020
Ray failing to find 4 V100 gpus on node Ray Core	4	382	May 23, 2022

Incorrect resource identification

Related topics