Replica schedule policy in compositions of models

TracebaK · March 29, 2024, 7:45am

How severe does this issue affect your experience of using Ray?

Low: It annoys or frustrates me for a moment.

Hi, I’m using ray serve to build my application. I got a pipeline of model deployments (i.e. model A, B and C) in an application, and for each request they are executed in order ‘A → B → C’. I wonder if there is a way to schedule the pipeline as a whole. For example, if I have two nodes in the cluster, I don’t want all model A replicas to be scheduled on node 1 while all model B replicas are on node 2. Because that would introduce extra time transfer data through network. In other words, I expect some data locality. I noticed that there is a ‘placement_group_bundles’ argument in the serve.deployment API, but through my test it only applies to the actors created by the deployment, it doesn’t work on model composition examples like Deploy Compositions of Models — Ray 2.10.0. Is there a way to control the replica schedule policy under this scenario? Thanks!

shrekris · March 29, 2024, 5:37pm

This doesn’t currently happen out of the box. Could you submit this as a feature request on GitHub, so we can track it?

Topic		Replies	Views
How to do scheduling and message communication in Serve DeploymentHandles Ray Serve	2	32	July 31, 2024
How does `serve` create replica and allocate resources when doing composition? Ray Serve	1	18	March 24, 2025
Dynamically create/terminate serve deployments based on available capacity Ray Serve	2	540	June 14, 2022
[Serve] Set placement group for deployment actors? Ray Serve	3	540	September 28, 2023
Avoiding cross node communication ray data pipeline	0	94	April 22, 2024

Replica schedule policy in compositions of models

Related topics