KUBERAY: vertical scalling of memory in worker nodes

jhowpd · March 23, 2023, 2:44pm

@Kai-Hsun_Chen is there a way let say, we set a limit and request definition in the worker nodes, and if the job fail due to reaching more then the memory limit, automatically the worker is resized to a larger memory, perform the task, and down size again to the requested resource, without killing the entire python process?

I have been processing massive amount of satellite data worldwide, given that I am representing the data as a graph, is highly sparse worldwide - variable spatial satellite artifacts, thus; some jobs require very little memory, while other requires substantial memory. Such vertical scaling would allow me save substantial amount of cloud computing cost. I guess this probably would be a common use case in sparse Big Data problems that can be helpful for other ray users as well.

Thanks again for being always so helpful

Topic		Replies	Views
ray.train.Trainer will autoscale? Ray Train	5	485	May 31, 2022
RayOutOfMemoryError: Why is autoscaler not creating new pods? Kubernetes	3	963	April 28, 2022
Ray distributed memory parallelism Ray Core	3	457	October 20, 2023
Problem with 8 worker	4	558	April 7, 2023
Is there a way to limit resources used by a ray job? Kubernetes	0	169	January 15, 2024

KUBERAY: vertical scalling of memory in worker nodes

Related topics