Hello Everybody!
I have a problem with writing to disk. Namely, I have a task that does the following, ray.put on a pandas data frame and then pops up 300 to 600 tasks which perform different calculations on the data frame and return results which are one row per task, which is like 300 to 600 rows of return matrix. What happens is that while the task is running, the more tasks pop up, the more disk space it takes, and this causes an eviction of the cluster as it eventually runs out of memory. Problem is that there is already 100 gb of disk space available, so it should be plenty for any operations, but there must be something that is causing a problem, by not deleting unnecessary files. I’ve read about this problem of multiple calling of ray.put, but I only call the function once per run.
Has anybody had anything similar?