Has anyone tried Ray Serve with NVIDIA MPS

valiantljk · June 6, 2021, 3:29pm

I’m wondering whether MPS can be used to improve the GPU utilization of inference.
Say we have a Ray Serve deployment, in real time, if the request load is low, is it possible to co-schedule another deployment on the same GPU to leverage the MPS for higher utilization?

lpan · March 13, 2024, 8:51pm

Hi @valiantljk i am also interested in using MPS, curious if you made any progress on it? thanks!

Topic		Replies	Views
Sequence/Tensor Parallelism with Ray Serve	2	416	May 23, 2024
Optimizing Real-Time ML Model Serving with Ray Serve on AWS GPU Cluster: Best Practices and Resource Allocation Strategies Ray Data	0	201	April 18, 2024
Ray Serve Model Worker Replicas Created But GPU Usage is 0% during Inference Ray Serve	7	950	January 19, 2022
Making Ray scheduler to Pack the workloads Ray Core	0	109	April 5, 2024
Exploring running Ray with cloud managed Batch services	0	314	March 21, 2023

Has anyone tried Ray Serve with NVIDIA MPS

Related topics