K8s metacontroller

I’ve have had a look at the kubernetes operator in ray/python/ray/ray_operator and the autoscaller for k8s. I was hoping for something that would scale up the workers for a job (which it seems to do) and then scale down once the job was finished (which it doesn’t seem to do).

I was thinking of trying to (learn and then) write a k8s metacontroller but just wanted to check in here to see if anyone else has done this already?

Hey @Alexander_Whillas thanks for dropping by!

Maybe @Dmitri who is the codeowner for the k8s operator would have more context?

The operator uses the Ray autoscaler internally. Worker pods are terminated after a configurable idle period. Check out the Ray autoscaler docs for more details.

Thanks for getting back to me Dmitri,

Perhaps my initial investigation findings were too hasty. I’m playing around with it now on k3s locally. Wil report back.