Hi! I’m a Ray newbie.
I’m looking for recommendations of how to build a very simple pipeline with Ray (not model serving per se, but somewhat similar):
- a single node
- node has 128 vCPUs
- I’d like to build a HTTP API which processes the CPU-heavy requests (so in total I’d like to specify maximum number of simultaneous workers as 256 - to max out CPU usage) with a function (wrapped/parallelized as Ray actor)
Is it possible to do just with Ray Serve (without fastapi/uvicorn; relying on Ray for controlling the max simultaneous parallelism)? How can I specify the Ray resources using Ray Serve? (e.g. I’d like to specify that actor takes up to 0.5 cpu resource, and in total there is 128 of cpu resource; I found Resource Allocation — Ray 2.46.0, but it seems lacking a complete example) Is there anywhere a complete code example of such basic pattern?
Thanks!
E.g. one can think of application like GitHub - project-numina/kimina-lean-server: Kimina Lean server which is a server which accepts HTTP requests for verification of programs the in Lean programming language. This requires having a pool of workers each running an instance of Lean compiler (which may crash occasionally). Would Ray / Ray Serve be a good framework for solving such a task?
1. Severity of the issue: (select one)
None: I’m just curious or want clarification.