I’m trying to connect to my Ray cluster from another pod in the same Kubernetes cluster as described here.
When attempting to connect to the Ray head service within my script with
ray.init("ray://<cluster-name>-ray-head:10001") I get the following:
Traceback (most recent call last): File "/home/sabri/code/domino/scratch/sabri/09-01_train_slices_gqa.py", line 6, in <module> ray.init("ray://ray-t4-1-cluster-ray-head:10001") File "/home/common/envs/conda/envs/domino/lib/python3.8/site-packages/ray/_private/client_mode_hook.py", line 62, in wrapper return func(*args, **kwargs) File "/home/common/envs/conda/envs/domino/lib/python3.8/site-packages/ray/worker.py", line 718, in init redis_address, _, _ = services.validate_redis_address(address) File "/home/common/envs/conda/envs/domino/lib/python3.8/site-packages/ray/_private/services.py", line 362, in validate_redis_address redis_address = address_to_ip(address) File "/home/common/envs/conda/envs/domino/lib/python3.8/site-packages/ray/_private/services.py", line 394, in address_to_ip ip_address = socket.gethostbyname(address_parts) socket.gaierror: [Errno -2] Name or service not known
The job manifest I’m using looks like:
# Job to submit a Ray program from a pod outside a running Ray cluster. apiVersion: batch/v1 kind: Job metadata: name: ray-test-job spec: template: spec: restartPolicy: Never containers: - name: ray image: rayproject/ray:latest-py38 imagePullPolicy: Always command: [ "/bin/bash", "-c", "--" ] args: - "source /pd/sabri/ray-startup.sh" resources: requests: cpu: 100m memory: 512Mi volumeMounts: - name: pv-1 # replace this with the name of the persistent volume you want to mount mountPath: /pd # this will mount the volume pv-1 at /home - name: dshm mountPath: /dev/shm volumes: - name: pv-1 # replace this with the name of the persistent volume you want to mount persistentVolumeClaim: claimName: pvc-1 # replace this with the name of the persistent volume claim - name: dshm emptyDir: medium: Memory
What might be causing this error when trying to connect?