Serve the same model replicas on the same GPU

islam_almersawi May 23, 2024, 5:30am 1

Is it possible to deploy multiple replicas of the same model on the same GPU? or it’s a must to allocate a different GPU for each replica?

Topic		Replies	Views
Ray Serve - Setting num_replicas > 1 errors out and not using GPU Ray Serve	5	1126	January 13, 2022
Model replication with multiple GPU deployments Ray Serve	4	1537	August 16, 2022
Ray Serve vLLM multiple models per GPU in tensor parallelism Ray Serve LLM APIs	1	769	August 14, 2025
Resources allocation during serve deployment Ray Serve	5	756	December 3, 2022
Multi GPU Usage on Multi VM\|Ray cluster on multi VM instances Ray Clusters	5	1655	January 17, 2025