Specifying resources using Ray Serve

Hi! I’m a Ray newbie.

I’m looking for recommendations of how to build a very simple pipeline with Ray (not model serving per se, but somewhat similar):

  1. a single node
  2. node has 128 vCPUs
  3. I’d like to build a HTTP API which processes the CPU-heavy requests (so in total I’d like to specify maximum number of simultaneous workers as 256 - to max out CPU usage) with a function (wrapped/parallelized as Ray actor)

Is it possible to do just with Ray Serve (without fastapi/uvicorn; relying on Ray for controlling the max simultaneous parallelism)? How can I specify the Ray resources using Ray Serve? (e.g. I’d like to specify that actor takes up to 0.5 cpu resource, and in total there is 128 of cpu resource; I found Resource Allocation — Ray 2.46.0, but it seems lacking a complete example) Is there anywhere a complete code example of such basic pattern?

Thanks!


E.g. one can think of application like GitHub - project-numina/kimina-lean-server: Kimina Lean server which is a server which accepts HTTP requests for verification of programs the in Lean programming language. This requires having a pool of workers each running an instance of Lean compiler (which may crash occasionally). Would Ray / Ray Serve be a good framework for solving such a task?

1. Severity of the issue: (select one)
None: I’m just curious or want clarification.

Hi, thanks for your interest in Ray Serve! Yes, its certainly possible to build this with Ray Serve. You can take a look at this example where the resources are specified per deployment - Serve a Stable Diffusion Model — Ray 2.46.0