Python API for creating Ray autoscaling cluster?


I was wondering if there’s a way to, or if there’s a plan to, supporting Python APIs for spinning up multi-node autoscaling Ray clusters.

I’m looking for something like…

cluster_info = ray.create_cluster("cluster.yaml") # call blocks until cluster is instantiated
head_node_ip = cluster_info["head_node_ip"]
ray.init(address="auto") # connect to cluster created in this script 

# do things with Ray API so that compute happens on the distributed Ray cluster

The following works:

  • In a python subprocess, run ray.scripts.scripts.start with the arguments as if you call ray start --head in the commandline. Add the --block flag to make it blocking in the sub process.
  • In the main process, wait for the ray_current_cluster file to appear in the temp directory. It contains the ip and port to connect to ray. The temp directory can be specified when you call ray start --head above.
  • Read the ray_current_cluster file and then call ray.init(address=...).
  • Run your workflow.
  • Call ray.shutdown() in the main process, then kill the subprocess.