Not sure how num_replicas works

tim · March 1, 2021, 2:22pm

Hello there,

I am serving pytorch models using ray serve. I have got 2 GPUs and I want to serve only one model for the moment.

I tried scaling out my model to 2 replicas, using

config = {"num_replicas": 2}
client.create_backend("resnet18:v0", ImageModel, ray_actor_options={"num_cpus": 6, "num_gpus": 1}, config=config)

This works. When I look at nvidia-smi however, I can see that each GPU is used only at 12% maximum, each. So I was wondering: why not set num_replicas to 4 and set “num_gpus” to 0.5? But unfortunately, it does not work.

Can you explain me why ? (Note: I am using a single node cluster for the moment)

Edit: I am testing this configuration by sending a bunch of ~1000 HTTP requests at the same time to the server.

Thank you in advance

architkulkarni · March 1, 2021, 6:05pm

Hi @tim,

Thanks for reporting this. Could you please share more details about the error you’re running into? I think what you’re describing should work. One guess is maybe you don’t have 24 CPUs available on your machine so there aren’t enough resources for 4 replicas.

tim · March 2, 2021, 1:05pm

Hi,

Thank you a lot for your help it was just that: I assigned too much CPUs to each worker. Sorry about that.

Have a nice day,

Regards

tim · March 2, 2021, 1:49pm

Actually, I have another issue: I am not sure how to allocate the cpus/threads to my workers.

First of all, when I do htop I can see 15 cpus, but when I use only one replica and set num_cpus to 15 it does not work, the only configuration that works is 13 cpus I don’t know why. It was suggested to use ray.nodes() to find insights but doing that I can clearly see that ray finds 15 cpus.

Second of all, I don’t know how to set OMP_NUM_THREADS in the following command: OMP_NUM_THREADS=16 ray start --head

It is said here that “to avoid performance degradation with many workers” it should be set to 1, but I have one worker and 1 to 3 replicas, should I increase the number of threads then ?

Thank you for your help

architkulkarni · March 2, 2021, 4:59pm

Hi Tim, sorry about that, the reason you can only fit 13 replicas is that two CPUs are already being used internally by Ray Serve, one for the Serve Controller actor and one for the HTTP proxy actor–we should probably make this more clear somewhere. In general, you can see what’s using your CPUs by looking at the Ray Dashboard.

I’m not sure the best answer to the OMP_NUM_THREADS question, @simon-mo do you know?

tim · March 4, 2021, 4:48pm

thank you for your answer, it is very clear

Topic		Replies	Views
Ray Serve - Setting num_replicas > 1 errors out and not using GPU Ray Serve	5	978	January 13, 2022
Ray Serve is executing the requests sequentially instead parallel even after configuring auto-scale Ray Serve	11	850	October 20, 2023
Model replication with multiple GPU deployments Ray Serve	4	1391	August 16, 2022
Multi GPU Usage on Multi VM\|Ray cluster on multi VM instances Ray Clusters	5	1436	January 17, 2025
How can I assign different GPU for different replicas in Ray Serve? Ray Serve	1	534	July 14, 2022

Not sure how num_replicas works

Related topics