K8s metacontroller

I’ve have had a look at the kubernetes operator in ray/python/ray/ray_operator and the autoscaller for k8s. I was hoping for something that would scale up the workers for a job (which it seems to do) and then scale down once the job was finished (which it doesn’t seem to do).

I was thinking of trying to (learn and then) write a k8s metacontroller but just wanted to check in here to see if anyone else has done this already?

Hey @Alexander_Whillas thanks for dropping by!

Maybe @Dmitri who is the codeowner for the k8s operator would have more context?

Hi!
The operator uses the Ray autoscaler internally. Worker pods are terminated after a configurable idle period. Check out the Ray autoscaler docs for more details.
https://docs.ray.io/en/master/cluster/autoscaling.html

Thanks for getting back to me Dmitri,

Perhaps my initial investigation findings were too hasty. I’m playing around with it now on k3s locally. Wil report back.

thanks

alex