Worker nodes are IDLE

Hi there, thanks for the great job!

I have a task that runs a quick inference of an ML model over the gpu. for this I set min_workers: 20 for my worker node. After ray initalizes all the worker nodes almost all of the stay idle (usually only 2-4 out of the 20 is not idle).

What seem to be the problem?
I also noticed that the head node is doing most of the workload, how can I do proper load balancing over all my nodes?
Thanks.

Thanks my config:

cluster_name: gpucluster
max_workers: 100
upscaling_speed: 2.0
idle_timeout_minutes: 10
docker:
   image: "rayproject/ray:latest-gpu"
   container_name: "ray_container"

provider:
    type: gcp
    region: ...
    availability_zone: ...
    project_id: ...
auth:
    ssh_user: ray
available_node_types:
    head_node:
        min_workers: 0
        max_workers: 0
        resources: {"CPU": 4, "GPU": 1}
        node_config:
            machineType: n1-highmem-4
            tags:
              - items: ["allow-all"]
            disks:
              - boot: true
                autoDelete: true
                type: PERSISTENT
                initializeParams:
                  diskSizeGb: 100
                  sourceImage: projects/deeplearning-platform-release/global/images/family/common-cu113
            guestAccelerators:
              - acceleratorType: .../nvidia-tesla-p100
                acceleratorCount: 1
            metadata:
              items:
                - key: install-nvidia-driver
                  value: "True"
            scheduling:
              - onHostMaintenance: "terminate"
              - automaticRestart: true
    worker_node:
        min_workers: 20
        resources: {"CPU": 4, "GPU": 1}
        node_config:
            machineType: n1-highmem-4
            tags:
              - items: ["allow-all"]
            disks:
              - boot: true
                autoDelete: true
                type: PERSISTENT
                initializeParams:
                  diskSizeGb: 100
                  sourceImage: projects/deeplearning-platform-release/global/images/family/common-cu113
            scheduling:
              - preemptible: false
            guestAccelerators:
              - acceleratorType: .../nvidia-tesla-p100
                acceleratorCount: 1
            metadata:
              items:
                - key: install-nvidia-driver
                  value: "True"
            scheduling:
              - onHostMaintenance: "terminate"
              - automaticRestart: true

head_node_type: head_node

Hey @mataney,

Would you be able to share what your Python script looks like? How are you spawning your models? Logically if you are running actors or tasks that require 1 GPU each in parallel, they should be run on different nodes (as each node has 1 GPU resource).