Use a customized scheduler instead of the default-scheduler

ray head node can’t run with error
Warning Failed 5m4s kubelet Error: failed to start container "ray-node": Error response from daemon: OCI runtime create failed: container_linux.go:346: starting container process caused "process_linux.go:449: container init caused \"process_linux.go:432: running prestart hook 0 caused \\\"error running hook: exit status 1, stdout: , stderr: nvidia-container-cli: device error: unknown device id: no-gpu-has-9MiB-to-run\\\\n\\\"\"": unknown

after some search, I believe that it is because of the cuda version mismatch.
the cuda version on my server is 11.4

nvidia-smi
Tue Jul 19 15:04:11 2022
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 470.57.02    Driver Version: 470.57.02    CUDA Version: 11.4

and I don’t know how to find docker image with cuda 11.4

newest on docker hub is rayproject/ray:nightly-py38-cu113

can anybody help with this?

Hey @biello! How are you starting the ray cluster?

just using helm helm -n ray install example-cluster --create-namespace ./ray

emm… it’s not about cuda. I need to use a customized scheduler instead of the default-scheduler for the ray head and worker pod in my environment, is there a way to do so?

found in crd.yaml, thank you

1 Like

Hi @biello , could you share the config in the crd you used to set a custom scheduler? Hoping what you found can help others in the future :slight_smile:

Yeah, of course. The file deploy/charts/ray/crds/cluster_crd.yaml has a “schedulerName” definition, the path is spec.podTypes[i].podConfig.spec.schedulerName. So just set this value to the custom scheduler name, it will work. So as other fields.

1 Like