Replica schedule policy in compositions of models

How severe does this issue affect your experience of using Ray?

  • Low: It annoys or frustrates me for a moment.

Hi, I’m using ray serve to build my application. I got a pipeline of model deployments (i.e. model A, B and C) in an application, and for each request they are executed in order ‘A → B → C’. I wonder if there is a way to schedule the pipeline as a whole. For example, if I have two nodes in the cluster, I don’t want all model A replicas to be scheduled on node 1 while all model B replicas are on node 2. Because that would introduce extra time transfer data through network. In other words, I expect some data locality. I noticed that there is a ‘placement_group_bundles’ argument in the serve.deployment API, but through my test it only applies to the actors created by the deployment, it doesn’t work on model composition examples like Deploy Compositions of Models — Ray 2.10.0. Is there a way to control the replica schedule policy under this scenario? Thanks!

This doesn’t currently happen out of the box. Could you submit this as a feature request on GitHub, so we can track it?