Hi, I have multiple deployments and followed the guide to bind them together. In the config file, I partitioned the GPUs for each deployment.
Here is my config file
applications:
- name: application
route_prefix: /
import_path: src.server:serve_app
runtime_env: {}
deployments:- name: Main
ray_actor_options:
num_cpus: 4
num_gpus: 0 - name: binding1
ray_actor_options:
num_cpus: 1
num_gpus: 0.33 - name: binding2
ray_actor_options:
num_cpus: 1
num_gpus: 0.33 - name: binding
ray_actor_options:
num_cpus: 1
num_gpus: 0.33
- name: Main
app = Main.bind(binding1.bind(), binding2.bind(), binding3.bind())
My model workflow is like send_request->Main->binding1->binding2->binding3->done
So the problem is that when I tried to scale the whole application, Ray allocates all the binding1s to the same GPU etc, which slows the performance.
What I want is a whole setup in one GPU and another setup in another GPU.
How can I specify this?