Hi, we are currently deploying ray cluster by sshing into every node and using ‘’‘ray start’’’. Is it possible to use autoscaler to manage this ray cluster and dynamically adjust ray cluster resources?
It looks like ray on-prem is what I needed: Launching Cloud Clusters — Ray v2.0.0.dev0
But is there any way to scale up/down cluster as needed. In terms of adding/removing items from
$list_of_node_ips at run-time depending on the workload?
I could, for example, supply a function which would allocate/deallocate given resource using some cluster management tools (like SLURM/SGE) by one node at a time.
Once this is merged, proper scale up and scale down on-prem clusters should be possible: