I’m working on a Ray Data pipeline where I need to utilize a pool of Ray actors (specifically, LLMs, each requiring a GPU) in multiple, non-adjacent processing steps. My pipeline looks like this:
- LLM Generation: Use the LLM pool to generate some initial data.
- Calculation: Perform calculations on the generated data without needing the LLMs.
- LLM Generation: Use the same LLM pool to generate further data based on the results of Step 2.
The standard map_batches
approach, which works well when LLMs are only used in a single step, doesn’t seem directly applicable here. I need a way to maintain and reuse the LLM actor pool across these separated steps.
(Merging all the steps into one seems to hurt performance since it reduces level of concurrency I presume.)
My current proposed solution is to launch the LLM pool with Ray Serve
. This would allow me to treat the LLMs as a service and call them from different stages of the Ray Data pipeline.
Is there any better way?