I cannot complete Ray Cluster bring-up which is blocks my team ability to evaluating the training runs. Therefore this issue has HIGH severity
after necessary updates of terraform variables the ray-on-gke cluster is created with two pools (default-pool and gpu-pool) but the nvidia-driver-installer failed:
Error: kube-system/nvidia-driver-installer failed to create kubernetes rest client for update of resource: Get "http://localhost/api?timeout=32s": dial tcp [::1]:80: connect: connection refused
│
│ with module.kubernetes.kubectl_manifest.nvidia_driver_installer[0],
│ on modules/kubernetes/kubernetes.tf line 19, in resource "kubectl_manifest" "nvidia_driver_installer":
│ 19: resource "kubectl_manifest" "nvidia_driver_installer" {