Upload files to a Ray cluster without changing `working_dir`

How severe does this issue affect your experience of using Ray?

  • High

I currently have a Ray cluster, hosted on a GPU server on GCS, setup to run jobs submitted through the Jobs SDK. The job runner file is located on the Ray cluster, and I can call it easily through the entrypoint parameter.

The problem I’m facing, however, is when I want to upload a file for the job to access when running. I can upload files through working_dir, but that sets the local directory of the cluster node to the local directory of where I’m submitting the job, making the entrypoint command unable to be called. I still want to be able to run the python code on the GPU cluster and preserve the entrypoint.

Is there any way that I can just simply upload files to the Ray cluster programmatically without changing the directory of the entrypoint?

So far I’ve tried calling the entrypoint through a relative command (for ex. python ../../job.py), but it seems that the working directory completely overrides the local directory of the Ray node.

I also saw the rsync-up command, but I’m more looking for a programming solution that I can insert into my code.

Any help would be appreciated!

I’m also looking into this - I have the following structure:

root:
  config:
    - config_file.yaml
  src:
    - file1.py
    - file2.py

Where file1.py runs a config loader that references config_file.yaml and also imports file2.py as a module. I’m able to get access to config by changing my working-dir to root from root/src, but this then breaks the relative imports. Struggling to get a sys.path.append to work, but this feels hackier than just copying additional files into the root path of the ray head or worker.