@Kai-Hsun_Chen is there a way let say, we set a
request definition in the worker nodes, and if the job fail due to reaching more then the memory limit, automatically the worker is resized to a larger memory, perform the task, and down size again to the requested resource, without killing the entire python process?
I have been processing massive amount of satellite data worldwide, given that I am representing the data as a graph, is highly sparse worldwide - variable spatial satellite artifacts, thus; some jobs require very little memory, while other requires substantial memory. Such vertical scaling would allow me save substantial amount of cloud computing cost. I guess this probably would be a common use case in sparse Big Data problems that can be helpful for other ray users as well.
Thanks again for being always so helpful