Ray cluster does is not creating workers?

Digging a bit I was able to find the issue. Just add the default ray service account to the node_config of the workers.

serviceAccounts:
          - email: "ray-autoscaler-sa-v1@xxx.iam.gserviceaccount.com"
            scopes: "https://www.googleapis.com/auth/cloud-platform"

@Igor glad it worked! For the GPU driver install, how do you manage it? I’ve been unable to install the drivers with the COS extension command; there’s been a bug in Google’s issue tracker (as well as their Github project) https://issuetracker.google.com/issues/164134488

Do you use cos-extensions install gpu to install the drivers on a COS image?

Hi, just added this to the initialization_commands.

timeout 300 bash -c "
      cos-extensions install gpu
      sudo mount --bind /var/lib/nvidia /var/lib/nvidia
      sudo mount -o remount,exec /var/lib/nvidia"

Then I was able to use the GPU from the container.

1 Like