Multi GPU Usage on Multi VM|Ray cluster on multi VM instances

Shobhit_Agarwal · July 10, 2023, 2:45pm

Background:
I want to try the LLM model, for example, flan-ul2 onto the two VM A10 GPUs provided by AWS, Each VM has 4 GPUs, so in my ray cluster I would have in total of 8 GPUs. Now, I want to create a ray cluster, which I already did by running the following commands:

on head node:
ray start --head

on worker node:
ray start --address=“:”

But now in my code where I created a Python class which I want to deploy, I want to share 6 GPUs for the task among the worker and head node, how can I proceed?

any leads can be beneficial.

High: It blocks me from completing my task.

Jules_Damji · July 10, 2023, 8:56pm

@Shobhit_Agarwal Here is a goodhttps://docs.ray.io/en/latest/ray-air/examples/gptj_serving.html example how you can use Ray Serve and Ray to serve an LLM model. For this model,
we use 16GB GPUs. We allocate one GPU per replica, so 6 replicas will have 6 GPUs.

Shobhit_Agarwal · July 11, 2023, 3:00am

@Jules_Damji really appreciate the quick response. But the thing is if i set num_replicas=6 and num_gpus=1, that means i am making 6 copies of it and each copy is utilising 1 GPU, please correct me if i am wrong.

The problem is I can’t be using single GPU for the LLM, I need at least 5/6 GPUs to serve the flan ul2 model since it is huge. So after creating the cluster, in my deployment class, I am setting num_gpus=6, num_replicas=1, but I am getting an error saying that, no resource can accommodate num_gpus=6, any leads can be helpful.

Shobhit_Agarwal · July 14, 2023, 5:37am

any leads would be really helpful.

Shobhit_Agarwal · August 2, 2023, 10:46am

@Jules_Damji, I have a scenario, where I create a ray cluster with 2 VMs, each having 4 GPUs, how can I distribute my ray serve that utilizes 4 GPUs from the first instance and 1 GPU from another instance? is there a workaround for this?

I created a cluster
ray start --head on head node,
and ray start --address= on worker node

and assigned num_gpus=5 in @serve.deployment class, but still I am getting the below error message:
no available node types can fulfill resource request {‘gpu’: 5.0},

even when I see resources available: {“gpu”: 8.0}

I hope there should be a workaround for this.

knowledgeseeker · January 17, 2025, 8:57pm

hello @Shobhit_Agarwal did you get any solution around it ?

Topic		Replies	Views
Ray serve GPU allocation error, deployment consuming all 8 GPU even though setting num_gpus=4 Ray Serve	1	681	February 2, 2024
Ray Serve - Setting num_replicas > 1 errors out and not using GPU Ray Serve	5	984	January 13, 2022
Model replication with multiple GPU deployments Ray Serve	4	1396	August 16, 2022
Serving LLM with multiple gpus Ray Serve	0	296	July 3, 2024
How can I assign different GPU for different replicas in Ray Serve? Ray Serve	1	535	July 14, 2022

Multi GPU Usage on Multi VM|Ray cluster on multi VM instances

Related topics