What is going on behind "handle_request_with_rejection" calls?

Hello! I’m checking both htop and Ray Dashboard to monitor my VM resources and I found that ray::ServeReplica:my_module:MyDeployment.handle_request_with_rejection is using a lot RAM and CPU capacity. I’m using serve to deploy some models.

I can’t find many references about this process and what it means, but searching for handle_request_with_rejection in Ray codebase I found that it’s related to max_ongoing_requests. See here.

I’m not using autoscaling since this replica holds a PyTorch model in GPU, so there’s just a single replica for this Deployment. Should I change max_ongoing_requests? I’m using Ray 2.10.

How severe does this issue affect your experience of using Ray?

  • Medium: It contributes to significant difficulty to complete my task, but I can work around it.

Hi @Augusto_Maillo this is totally normal, it’s just the name of the actor method that Serve uses when executing a request. It will call your handler method or the FastAPI app.

In this scenario, if multiples handle_request_with_rejection calls shows up, it indicates that my actor is being highly used?

Checking htop in my machine and ordering by the memory usage, I see this

Isn’t it a problem? Can increase actor queue size help me?

Edit: This actor run a torch model in gpu, so it is naturally heavy. It’s difficult to me to differ which loads come from model execution or bad using of Ray.

I think what you’re seeing here are many torch threads, they just all share the process title that’s set by Ray.

If they are torch threads, shouldn’t they have the same PID?
Edit: My bad. htop PID is not always process id. With TGID I can see that are threads from the same process. Thank you for the help.