Hi, I am trying to find more information on xxx.partd
files that are created by Ray in /tmp
folder. These folders are created in each run and accumulate over time creating unnecessary storage blockage. Unfortunately I can’t find anything on it in docs or on github. Are these files spilled objects? Shared memory? Can I somehow mange them or clean them up automatically? Appreciate any hints
I’ve never heard this file is created from Ray (I am also not aware of any possibility).
Do you happen to know when this file is created, and what this is for?
The partd
files are used by Dask when shuffling. Intermediate results, when do not fit into memory are saved to disk as partd
files.
They should be automatically cleaned when are not needed anymore and Dask scheduler does that, but Ray seems to leave the files cluttering the disk. It happens more often in the Ray Cluster environment than with local Ray, but after raising an issue on Dask Github [Bug] [Dask-on-Ray] Partd files are not cleaned automatically · Issue #8787 · dask/dask · GitHub I have a lead that this behaviour is caused by wrong dask.config set with Ray Cluster. Let me check and get back here.
Sounds good to me. Let me know how things go!