[Clusters] [Core] Head node max_workers is not respected

Hi, it doesn’t matter what I pass as the head node’s max_workers it always uses all the CPUs available in the vm.
Did I missunderstand this attribute or is there a bug?
Reproduce with the following:

cluster_name: matan
max_workers: 10
provider:
    type: gcp
    region: us-west1
    availability_zone: us-west1-a
    project_id: ai2-israel
auth:
    ssh_user: ray
available_node_types:
    head_node:
        min_workers: 0
        max_workers: 1
        resources: {"CPU": 4}
        node_config:
            machineType: n1-highmem-4
            tags:
              - items: ["allow-all"]
            disks:
              - boot: true
                autoDelete: true
                type: PERSISTENT
                initializeParams:
                  diskSizeGb: 50
                  sourceImage: projects/deeplearning-platform-release/global/images/family/common-cpu-debian-9
    worker_node:
        min_workers: 0
        resources: {"CPU": 2}
        node_config:
            machineType: n1-highmem-2
            tags:
              - items: ["allow-all"]
            disks:
              - boot: true
                autoDelete: true
                type: PERSISTENT
                initializeParams:
                  diskSizeGb: 50
                  sourceImage: projects/deeplearning-platform-release/global/images/family/common-cpu-debian-9
            scheduling:
              - preemptible: false
head_node_type: head_node

# Command to start ray on the head node. You don't need to change this.
head_start_ray_commands:
    - ray stop
    - >-
      ulimit -n 65536;
      ray start
      --head
      --port=6379
      --object-manager-port=8076
      --autoscaling-config=~/ray_bootstrap_config.yaml

# Command to start ray on worker nodes. You don't need to change this.
worker_start_ray_commands:
    - ray stop
    - >-
      ulimit -n 65536;
      ray start
      --address=$RAY_HEAD_IP:6379
      --object-manager-port=8076
import ray
import logging
from time import sleep
import ray
import ray.autoscaler.sdk

logger = logging.getLogger(__file__)

@ray.remote(num_cpus=1.0, max_restarts=-1, max_task_retries=-1)
class Reproduce(object):
    def run(self):
        for i in range(10, -1, -1):
            sleep(1)
        return 4

ray.init(address="auto")

arr = []
for _ in range(40):
    c = Reproduce.remote()
    arr.append(c.run.remote())

print(ray.get(arr))

In this example, the head node will run “run” method on 4 cpus instead of only on a single one.

Here max_workers refers to the number of nodes, not the number of CPUs. As for why it looks like the process is running on 4 cpus, I think the process may be being rotated among different CPUs, so each one is used 25% of the time. cc @Alex, did I get this right, or do you have any thoughts on what’s happening here?

I agree that max_workers is only referring to the number of nodes, not processes. I think it’s just running 4 copies of the reproduce class on the head node because num_cpus=4 and each actor takes 1 cpu.