What is going on behind "handle_request_with_rejection" calls?

Augusto_Maillo · August 7, 2024, 5:03pm

Hello! I’m checking both htop and Ray Dashboard to monitor my VM resources and I found that ray::ServeReplica:my_module:MyDeployment.handle_request_with_rejection is using a lot RAM and CPU capacity. I’m using serve to deploy some models.

I can’t find many references about this process and what it means, but searching for handle_request_with_rejection in Ray codebase I found that it’s related to max_ongoing_requests. See here.

I’m not using autoscaling since this replica holds a PyTorch model in GPU, so there’s just a single replica for this Deployment. Should I change max_ongoing_requests? I’m using Ray 2.10.

How severe does this issue affect your experience of using Ray?

Medium: It contributes to significant difficulty to complete my task, but I can work around it.

eoakes · August 13, 2024, 4:21pm

Hi @Augusto_Maillo this is totally normal, it’s just the name of the actor method that Serve uses when executing a request. It will call your handler method or the FastAPI app.

Augusto_Maillo · August 13, 2024, 6:13pm

In this scenario, if multiples handle_request_with_rejection calls shows up, it indicates that my actor is being highly used?

Checking htop in my machine and ordering by the memory usage, I see this

Isn’t it a problem? Can increase actor queue size help me?

Edit: This actor run a torch model in gpu, so it is naturally heavy. It’s difficult to me to differ which loads come from model execution or bad using of Ray.

eoakes · August 13, 2024, 7:05pm

I think what you’re seeing here are many torch threads, they just all share the process title that’s set by Ray.

Augusto_Maillo · August 13, 2024, 7:28pm

If they are torch threads, shouldn’t they have the same PID?
Edit: My bad. htop PID is not always process id. With TGID I can see that are threads from the same process. Thank you for the help.

Topic		Replies	Views
Max concurency for deployment Ray Serve	1	1456	June 6, 2022
Ray serve autoscaling queue size Ray Serve	5	1394	May 24, 2022
Why Ray Serve only just use half numbers of replicas for parallelism Ray Serve	4	668	February 10, 2023
How to control the total memory of ray.serve? Ray Serve	3	874	November 10, 2021
Can we customize the behavior of Ray Serve when max_concurrent_requests is reached? Ray Serve	1	323	December 29, 2023

What is going on behind "handle_request_with_rejection" calls?

Related topics