Hi, Ray team. Have a question.
We are doing a hyperparameter tuning job per ray cluster in k8s. Currently, we want to launch a raycluster in k8s and then launch a k8s job to submit job to the raycluster and then uninstall the raycluster in k8s cluster after ray.tune.run() is done.
Regarding how to interact with raycluster from inside a k8s job, I found out that there are two ways:
-
In the code, use ray.init(address=head-node-ip:10001). Then do
python my_ray_tune_script.py
in the pod. Then it will do the ray.tune.run() in the remote raycluster.- However, I noticed that only the ray.tune.run() part is on raycluster, other parts of the code are still run inside the k8s job pod. (e.g. If I saved a file, it will be saved in the pod which I run
python xxxx.py
). Is it possible to have everything running inside the remote raycluster?
- However, I noticed that only the ray.tune.run() part is on raycluster, other parts of the code are still run inside the k8s job pod. (e.g. If I saved a file, it will be saved in the pod which I run
-
Use the
ray job submit
from inside a k8s job.
Based on Deploying on Kubernetes — Ray 1.13.0, it is only recommended to be used from a local machine? Is my understanding correct?
Which way do you recommend? Or if there is a better way?