Sharing resource with a binding application in a multiple-GPUS cluster

Hi, I have multiple deployments and followed the guide to bind them together. In the config file, I partitioned the GPUs for each deployment.

Here is my config file
applications:

  • name: application
    route_prefix: /
    import_path: src.server:serve_app
    runtime_env: {}
    deployments:
    • name: Main
      ray_actor_options:
      num_cpus: 4
      num_gpus: 0
    • name: binding1
      ray_actor_options:
      num_cpus: 1
      num_gpus: 0.33
    • name: binding2
      ray_actor_options:
      num_cpus: 1
      num_gpus: 0.33
    • name: binding
      ray_actor_options:
      num_cpus: 1
      num_gpus: 0.33

app = Main.bind(binding1.bind(), binding2.bind(), binding3.bind())

My model workflow is like send_request->Main->binding1->binding2->binding3->done

So the problem is that when I tried to scale the whole application, Ray allocates all the binding1s to the same GPU etc, which slows the performance.
What I want is a whole setup in one GPU and another setup in another GPU.

How can I specify this?