My cluster have 7 gpus and 28 cpus and I have started a Raytrain with num_workers=6, trainer_resources={"CPU": 4}, resources_per_worker={"CPU": 4, "GPU": 1} , I am getting resource request cannot be scheduled warning?

tonygracious · June 16, 2024, 5:14pm

I have getting the following warning

“(autoscaler +18m43s) Warning: The following resource request cannot be scheduled right now: {‘CPU’: 1.0}. This is likely due to all cluster resources being claimed by actors. Consider creating fewer actors or adding more nodes to this Ray cluster.”

My ray cluster have following setting

"setup_ray_cluster(

num_worker_nodes=6,
num_cpus_worker_node=4,
num_gpus_worker_node=1,
num_cpus_head_node= 4,
num_gpus_head_node= 1)"

and RayTrainer have =ScalingConfig(num_workers=6, trainer_resources={“CPU”: 4}, use_gpu=True, resources_per_worker={“CPU”: 4, “GPU”: 1})

How can I solve this?

Also, where does the ray training coordinator is running? How is the head node different from the node where trainer coordinator runs?

Sam_Chan · July 23, 2024, 5:58am

What do you mean by “training coordinator”?

tonygracious · July 23, 2024, 2:01pm

https://docs.ray.io/en/latest/train/api/doc/ray.train.ScalingConfig.html

training coordinator is mention in the docs

Topic		Replies	Views
Warning regarding limited resources	0	420	November 13, 2023
Ray Trainer looking for more CPU's than that of its initialized on Ray Train	1	725	September 27, 2022
Ray indicates that the request resource is insufficient Ray Clusters	0	657	December 19, 2022
Ray actors cannot be scheduled due to resources constraints	19	2058	November 10, 2022
IDLE ray worker nodes in map_batches	2	374	May 24, 2023

My cluster have 7 gpus and 28 cpus and I have started a Raytrain with num_workers=6, trainer_resources={"CPU": 4}, resources_per_worker={"CPU": 4, "GPU": 1} , I am getting resource request cannot be scheduled warning?

Related topics