[raycluster, k8s] How to interact with ray cluster from inside a k8s pod

Hi, Ray team. Have a question.

We are doing a hyperparameter tuning job per ray cluster in k8s. Currently, we want to launch a raycluster in k8s and then launch a k8s job to submit job to the raycluster and then uninstall the raycluster in k8s cluster after ray.tune.run() is done.

Regarding how to interact with raycluster from inside a k8s job, I found out that there are two ways:

  1. In the code, use ray.init(address=head-node-ip:10001). Then do python my_ray_tune_script.py in the pod. Then it will do the ray.tune.run() in the remote raycluster.

    • However, I noticed that only the ray.tune.run() part is on raycluster, other parts of the code are still run inside the k8s job pod. (e.g. If I saved a file, it will be saved in the pod which I run python xxxx.py). Is it possible to have everything running inside the remote raycluster?
  2. Use the ray job submit from inside a k8s job.
    Based on Deploying on Kubernetes — Ray 1.13.0, it is only recommended to be used from a local machine? Is my understanding correct?

Which way do you recommend? Or if there is a better way?

I’d recommend using ray job submit from inside the k8s job, which is your 2nd option. The docs you link to show how to run it from your local machine for development, but it is perfectly fine to run it from inside the k8s job.

Otherwise (with option 1), you need to carefully craft your code so that everything runs on the Ray cluster by putting it into tasks. It’s simpler to run the script on the head node via ray job submit.

1 Like

Ray Job submission from a K8s job, followed by cluster teardown is a perfectly natural pattern.

Speaking of KubeRay, we’re working on supporting exactly this pattern with a “K8s-native” (declarative, custom-resource-based) interface.
The draft PR is here!

1 Like

Thanks. Do you know when this rayJob helm chart will be available for users to try out?

I’m not sure. @harryge00 @simon-mo might have a better sense of the timeline :slight_smile: