I’ve have had a look at the kubernetes operator in ray/python/ray/ray_operator and the autoscaller for k8s. I was hoping for something that would scale up the workers for a job (which it seems to do) and then scale down once the job was finished (which it doesn’t seem to do).
I was thinking of trying to (learn and then) write a k8s metacontroller but just wanted to check in here to see if anyone else has done this already?
Hi!
The operator uses the Ray autoscaler internally. Worker pods are terminated after a configurable idle period. Check out the Ray autoscaler docs for more details. https://docs.ray.io/en/master/cluster/autoscaling.html