Sharing resource with a binding application in a multiple-GPUS cluster

Jiko_LU · July 30, 2024, 7:00am

Hi, I have multiple deployments and followed the guide to bind them together. In the config file, I partitioned the GPUs for each deployment.

Here is my config file
applications:

name: application
route_prefix: /
import_path: src.server:serve_app
runtime_env: {}
deployments:
- name: Main
  ray_actor_options:
  num_cpus: 4
  num_gpus: 0
- name: binding1
  ray_actor_options:
  num_cpus: 1
  num_gpus: 0.33
- name: binding2
  ray_actor_options:
  num_cpus: 1
  num_gpus: 0.33
- name: binding
  ray_actor_options:
  num_cpus: 1
  num_gpus: 0.33

app = Main.bind(binding1.bind(), binding2.bind(), binding3.bind())

My model workflow is like send_request->Main->binding1->binding2->binding3->done

So the problem is that when I tried to scale the whole application, Ray allocates all the binding1s to the same GPU etc, which slows the performance.
What I want is a whole setup in one GPU and another setup in another GPU.

How can I specify this?

Topic		Replies	Views
Gpu allocation for ray serve on multi gpu environment Ray Serve	5	56	November 18, 2024
Spread accross several fractional GPUs or 1< num_gpus < 2 Ray Core	1	272	February 13, 2024
How can I assign different GPU for different replicas in Ray Serve? Ray Serve	1	489	July 14, 2022
How do Ray actors share a GPU? Ray Core	2	1968	December 15, 2021
Serving Multiple Applications with Ray serve in separate Docker Containers Ray Serve	4	117	June 18, 2024

Sharing resource with a binding application in a multiple-GPUS cluster

Related topics