How to use RayJob with custom Python interpreter?

JamesTann · July 9, 2024, 2:08pm

I am trying to deploy Ray to Kubernetes using a custom Docker image. I cannot use any Ray base Docker images for my use case, so I need to install Ray in my own image. This is made more complicated by the fact that I am using a custom Python interpreter wrapped in a shell script. I am able to run ray using this pattern in my container:

/path/to/python.sh /path/to/python/bin/ray start|stop|etc.

My question is how I can use this pattern with a RayJob. I found that I can set the command for my worker and head nodes directly:

containers:
  - name: head
     image: my-custom-image
     command:
       [
         "/path/to/python.sh",
         "/path/to/python/bin/ray",
         "start",
         "--head",
       ]
...
containers:
  - name: worker
    image: my-custom-image
    command:
        [
            "/path/to/python.sh",
            "/path/to/python/bin/ray",
            "start",
            "--address='$(RAY_HEAD_SERVICE_HOST):6379'",
        ]
    lifecycle:
        preStop:
            exec:
                command:
                    [
                        "/bin/sh",
                        "-c",
                        "/path/to/python.sh",
                        "/path/to/python/bin/ray",
                        "stop",
                    ]

With this I am able to start the cluster, but when I inspect the logs for my head node, I see this:

2024-07-09 13:34:39,326 INFO scripts.py:767 -- Local node IP: 10.0.16.61
2024-07-09 13:34:43,261 SUCC scripts.py:804 -- --------------------
2024-07-09 13:34:43,261 SUCC scripts.py:805 -- Ray runtime started.
2024-07-09 13:34:43,261 SUCC scripts.py:806 -- --------------------
2024-07-09 13:34:43,261 INFO scripts.py:808 -- Next steps
2024-07-09 13:34:43,261 INFO scripts.py:811 -- To add another node to this Ray cluster, run
2024-07-09 13:34:43,261 INFO scripts.py:814 --   ray start --address='10.0.16.61:6379'
2024-07-09 13:34:43,261 INFO scripts.py:823 -- To connect to this Ray cluster:
2024-07-09 13:34:43,262 INFO scripts.py:825 -- import ray
2024-07-09 13:34:43,262 INFO scripts.py:826 -- ray.init()
2024-07-09 13:34:43,262 INFO scripts.py:838 -- To submit a Ray job using the Ray Jobs CLI:
2024-07-09 13:34:43,262 INFO scripts.py:839 --   RAY_ADDRESS='http://127.0.0.1:8265' ray job submit --working-dir . -- python my_script.py
2024-07-09 13:34:43,262 INFO scripts.py:848 -- See https://docs.ray.io/en/latest/cluster/running-applications/job-submission/index.html 
2024-07-09 13:34:43,262 INFO scripts.py:852 -- for more information on submitting Ray jobs to the Ray cluster.
2024-07-09 13:34:43,262 INFO scripts.py:857 -- To terminate the Ray runtime, run
2024-07-09 13:34:43,262 INFO scripts.py:858 --   ray stop
2024-07-09 13:34:43,262 INFO scripts.py:861 -- To view the status of the cluster, use
2024-07-09 13:34:43,262 INFO scripts.py:862 --   ray status
2024-07-09 13:34:43,262 INFO scripts.py:866 -- To monitor and debug Ray, view the dashboard at 
2024-07-09 13:34:43,262 INFO scripts.py:867 --   127.0.0.1:8265
2024-07-09 13:34:43,262 INFO scripts.py:874 -- If connection to the dashboard fails, check your firewall settings and network configuration.
/bin/bash: line 1: ray: command not found

The last line causes the container to fail. My guess is that I have set the start and stop commands correctly but there are other processes running that also need to access the ray executable and cannot find it. Is there a way I can set the path in the RayJob configuration? Or do I need to modify my base Docker image in some way?

JamesTann · July 12, 2024, 3:25pm

I ended up solving this issue with a pretty trivial solution: I just added the path to my custom python bin to the $PATH variable. With this method, I don’t need to override anything manually.

Topic		Replies	Views
Customize the Ray binary in head and worker nodes Kubernetes	3	419	July 1, 2022
Kubernetes Job running a Cross Language Java class Kubernetes	2	676	April 11, 2021
Ray on AKS using Kubernetes Job with runtime_env working_dir throws error Kubernetes	6	1072	January 21, 2022
Ray serving with working directory as folder location Ray Serve	3	1983	December 14, 2022
Problems lauching gcp cluster Ray Core	4	729	July 7, 2022

How to use RayJob with custom Python interpreter?

Related topics