Model replication with multiple GPU deployments

puntime_error · December 17, 2021, 8:05pm

I have 2 CUDA GPU resources on my server that is running ray. If I deploy a single model replica with num_gpus": 1}, things are fine. If I set the number of replicas to 2, I get a "RuntimeError: CUDA out of memory. " error when Ray tries to deploy the 2nd replica. How are people deploying multiple replicas with multiple GPUs? Any sample code/gist or suggestions to try would be appreciated.

simon-mo · December 20, 2021, 7:27pm

Hi @puntime_error, when you set the replicas to 2 and keep num_gpus to 1, there should be two identical process created, each taking one GPU with their corresponding CUDA_VISIBLE_DEVICES.

Can you take a look at nvidia-smi before and after and make sure there are no other processes running on the two GPUs?

Sujit_Kumar · August 16, 2022, 1:09pm

@simon-mo @puntime_error am unable to use multiple gpus while serving ray.
I have raised an issue here Issue on page /serve/getting_started.html · Issue #27905 · ray-project/ray · GitHub
Can you please help?

architkulkarni · August 16, 2022, 6:15pm

Hi @Sujit_Kumar, to narrow down the issue, do the basic GPU examples at GPU Support — Ray 3.0.0.dev0 work for you? You can try it on Ray 2.0.0rc1 (pip install "ray[serve, default]==2.0.0rc1")

architkulkarni · August 16, 2022, 6:24pm

@Sujit_Kumar you may also have to tell the transformers library to use GPU; see How to make transformers examples use GPU? · Issue #2704 · huggingface/transformers · GitHub for example and the surrounding context.

Topic		Replies	Views
Ray Serve - Setting num_replicas > 1 errors out and not using GPU Ray Serve	5	984	January 13, 2022
Serve the same model replicas on the same GPU Ray Serve	0	116	May 23, 2024
Not sure how num_replicas works Ray Serve	5	1716	March 4, 2021
Multi GPU Usage on Multi VM\|Ray cluster on multi VM instances Ray Clusters	5	1446	January 17, 2025
RuntimeError: CUDA error: invalid device ordinal when have multiple ray deployment on the same GPU Ray Serve	5	1470	May 3, 2023

Model replication with multiple GPU deployments

Related topics