Auto-generated local copy in Home directory (Ray 2.7.0)

When running Ray 2.7.0, in addition to all the trial results saved in the directory defined by “air.RunConfig(storage_path”, it somehow saved another identical copy in my home directory with the sub-directory name “ray_results”. This behavior only occurs for Ray 2.7.0 . Older version has no such an issue.

@wxie2013 Thanks for bringing this up. This change in behavior is a result of an implementation detail from the re-factoring we did for Ray Tune and Ray Train in 2.7 – the main reason was to unify the behavior across local storage and cloud storage.

We now always write to an intermediate directory (which defaults to ~/ray_results). The intermediate ~/ray_results directory makes more sense when the storage path is set to cloud storage. It acts as the location on the local filesystem that contains training artifacts before they get uploaded.

If you’re just setting the storage_path to a different local path, then you can configure this with the RAY_AIR_LOCAL_CACHE_DIR environment variable. For example, you could set it to a /tmp/… directory for the OS to clean it up automatically after some time. This intermediate folder doesn’t contain large files — for example, there won’t be a copy of checkpoints in there.

Take a look here: Configuring Persistent Storage — Ray 2.7.0

Hi @justinvyu, thanks for the follow up. Somehow in my case, the intermediate local directory and the persistent storage contains the exactly the same files. I have some large output files generated from each trial and they are in both directories. It’s too expensive to have two identical copies when the number of trials is large. Here are more questions:

  • If I just need a local storage, what would happen if intermediate local directory and persistent storage are set to be the same?

  • Can the intermediate local directory be removed right after the files is finished uploading or copying to another local persistent storage directory ?

@wxie2013

  1. If the directories are the same, then there’ll only be a single copy of your experiment directory, without any extra copying happening. For your usage, you may just want to set ONLY the intermediate directory with the environment variable instead of the storage_path.
  2. Ray Tune/Train currently doesn’t delete the intermediate directory at any point.
1 Like