Ray Tune Sync with S3 on 2.2.0

How severe does this issue affect your experience of using Ray?

  • Medium: It contributes to significant difficulty to complete my task, but I can work around it.

Hi,

I’m recently upgrading my projects from 1.13.x to 2.2.0 and I noticed that sync up to s3 is somehow working differently.

  • 1.13.X: everything under experiment’s local_dir will be synced up to S3.
  • 2.2.0: My observation is that only checkpoint folders generated on workers node will be synced up to s3 and files on head node will be ignored. And terminal output during experiment running also suggests that sync up only happening on the worker nodes.

Is this expected behavior? How can I force syncing up on head node?

Thanks

Hi @IshiriRuritori, the files on the head node should still be synced up per default.

Are there any logs on the driver that maybe suggest that experiment checkpointing did not work?

Can you check your local directory if there’s an experiment_state...json file in the experiment directory?

Thanks for your reply.

Yes, I can find those files on head node and some how they are not uploaded to S3. Per tune FAQ, I’ve tried to use awscli version of syncer but same thing still happens to my experiments. By switching to awscli syncer, I can see log output of uploading happening on worker node.

Any other suggestion?