How does one copy folders from workers to headnode?

Hello fellow raylings,

I started using Ray in the past month, mainly to make scaling my ML research on the GCP a lot more efficient than it was before.

It’s been a breeze mostly, but I am struggling to find information on how one can copy/sync directories from the workers to the headnode.

In slurm (what I previously used), the headnode could be configured to be accessible at a particular path on all worker nodes. In ray, as far as I know right now, this is not possible.

I am also using ray-tune to make the process of hyperparameter tuning easier. However, other than quite literally returning a whole folder cast as an object to be saved, I have been unable to sync/copy data from the workers to the headnode.

I suspect that file_mounts and cluster_synced_files are where I might be able to find a solution here, but as far as I understand they are both meant to either sync from local to head+workers, as well as head to workers, NOT workers to headnode/local.

Any and all help is highly appreciated.

Documentation entries of file_mounts/cluster_synced_files

P.S. I spent hours on the docs and could not find an answer, also asked on slack without a reply. Forgive me if the answer is in either of these and I missed it. I am a Ray noob.

Hey @AntreasAntoniou great question!

Ray Tune actually has a syncing mechanism to sync all files in each trial directory on worker nodes back to the head node which uses Rsync. It should be enabled by default.

If that’s not working, the closest analogy to SLURM is to enable a networked file system (https://cloud.google.com/filestore) and point your files to that.

Hope that helps!