Hi, we are currently deploying ray cluster by sshing into every node and using ‘’‘ray start’’’. Is it possible to use autoscaler to manage this ray cluster and dynamically adjust ray cluster resources?
1 Like
But is there any way to scale up/down cluster as needed. In terms of adding/removing items from $list_of_node_ips
at run-time depending on the workload?
I could, for example, supply a function which would allocate/deallocate given resource using some cluster management tools (like SLURM/SGE) by one node at a time.
Once this is merged, proper scale up and scale down on-prem clusters should be possible: