Raytune does not use resources of the second node

I’ve created a cluster manually using the ray start --head --port=6379 command on the head (machine-1) and ray start --address='<address>:6379' on (machine-2).

Then I proceed to start tuning with code that looks like the one presented below (there are some omitted):

trainable_with_gpu = tune.with_resources(
        {'GPU': 4, 'CPU': 20},
tuner = tune.Tuner(
results = tuner.fit()

Here I’ve specified that each trial should use 4 GPUs and 20 CPUs. machine-1 has 8 GPUs and 40 CPUs, moreover machine-2 has 4 GPUs and 64 CPUs. From my understanding, the limiting factor here would be the GPUs and no more than three trials would be running in parallel. Is that correct?

Next some evidences supporting my claim, resources from the second node are not being utilized.

In the screenshot below you can see the output of nvidia-smi of machine-1 and machine-2 from left to right, respectively. As you can see, no job is scheduled using the resources of machine-2 (right)

What am I missing?


(Due to newcomers restrictions of just one media per post I am forced to reply to my previous comment with additional evidences)

In addition, in the below screenshot you can see the logs of the experiment itself. It show that two processes are running. All the <address> points to the machine-1.