Use a customized scheduler instead of the default-scheduler

biello · July 19, 2022, 7:24am

ray head node can’t run with error
Warning Failed 5m4s kubelet Error: failed to start container "ray-node": Error response from daemon: OCI runtime create failed: container_linux.go:346: starting container process caused "process_linux.go:449: container init caused \"process_linux.go:432: running prestart hook 0 caused \\\"error running hook: exit status 1, stdout: , stderr: nvidia-container-cli: device error: unknown device id: no-gpu-has-9MiB-to-run\\\\n\\\"\"": unknown

after some search, I believe that it is because of the cuda version mismatch.
the cuda version on my server is 11.4

nvidia-smi
Tue Jul 19 15:04:11 2022
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 470.57.02    Driver Version: 470.57.02    CUDA Version: 11.4

and I don’t know how to find docker image with cuda 11.4

newest on docker hub is rayproject/ray:nightly-py38-cu113

can anybody help with this?

amogkam · July 19, 2022, 7:25am

Hey @biello! How are you starting the ray cluster?

biello · July 19, 2022, 7:51am

just using helm helm -n ray install example-cluster --create-namespace ./ray

biello · July 19, 2022, 9:52am

emm… it’s not about cuda. I need to use a customized scheduler instead of the default-scheduler for the ray head and worker pod in my environment, is there a way to do so?

biello · July 19, 2022, 10:47am

found in crd.yaml, thank you

cade · July 19, 2022, 5:04pm

Hi @biello , could you share the config in the crd you used to set a custom scheduler? Hoping what you found can help others in the future

biello · July 20, 2022, 8:03am

Yeah, of course. The file deploy/charts/ray/crds/cluster_crd.yaml has a “schedulerName” definition, the path is spec.podTypes[i].podConfig.spec.schedulerName. So just set this value to the custom scheduler name, it will work. So as other fields.

Topic		Replies	Views
Logging in to GCP custom docker image Ray Clusters	0	216	February 17, 2024
CUDA-capable device(s) is/are busy or unavailable Ray Clusters	1	927	February 1, 2023
How to set head pod NoSchedule Ray Core	5	989	April 7, 2021
Worker nodes fail to setup container Ray Clusters	1	701	September 12, 2022
Kuberay cluster not create worker pods after ray operator update to 1.1.0 Kubernetes	0	420	March 29, 2024

Use a customized scheduler instead of the default-scheduler

Related topics