Ray 2.10.0: Can't find a `node_ip_address.json` file when using _temp_dir in ray.init()

This happens in Ray.2.10.0. The following command runs fine:

 ray.init(address=ip_head, include_dashboard=False) 

but the following command:


leads to the following errors:

 ValueError: Can't find a `node_ip_address.json` file from /scratch/session_2024-04-02_16-32-35_009347_3527526. for 60 seconds

I know the default is at the /tmp/ray of the running node, but it’s not convenient for me to access it to look into the details of the jobs running. I’m using slurm for the jobs submission.
Any help is appreciated.

@wxie2013 _temp_dir is not a public API so you shouldn’t use it.

To specify the tmp dir into someplace other than /tmp/ray just use ray start --head --temp-dir=/scratch with start the head node.

Thanks @yic . Not quite following. The __temp_dir is an argument of ray.init(). Actually I’ve been using it for ray.train and it does redirect the output from /tmp to a specified directory. The issue only occurs when using ray.tune

Hi @wxie2013 everything started with _ is a private API. We use it for testing or other purpose. Although you’ve been using it, it doesn’t mean it’s a public API and you are not supposed to use it. In the worst case, it can be deleted in the future.

To change the tmp dir, just provide --temp-dir= when you start the head node. This is the right way to setup the tmp dir.

Right now, all ray nodes are supposed to have the same tmp dir. If this doesn’t work for you, you can create a ticket and let me know. I’ll make sure it’s being tracked.

Thanks again@yic. I don’t use “ray start” command directly. I use ray.init() in a python script to start the ray process. For example, a python script named “example.py” :

 import ray

Then I run it via

 python example.py

Is there a way to specify the directory as an init argument?

This is not what we recommend to start ray worker nodes.

You can use this to start ray head node for single node usage.

If you really want to do this, make sure _temp_dir is the same in all ray.init (head and workers)

1 Like

Thanks @yic for the prompt responses.

Hi @yic I did a try to use ray start --temp-dir=/scratch, the issue is that in the artifacts directory is missing, i.e. /scratch/ray/session_latest/artifacts. When using the ray.init(__temp_dir=‘/scratch’), the artifacts directory does exist. Is there another argument to keep the artifacts in the redirected output when using ray start --temp-dir?

Hi @wxie2013 you can and only should setup tmp dir in the head node. The ray worker node will get the same dir as the head node.

Hi @yic. Yes I indeed used only head node. The issue is that artifacts directory is missing. I will need the content in that directory to do some detailed checks. By using ray.init(__temp_dir), the artifacts directory is there.

Here are some more observations:

using the command “ray start --head --temp-dir=/scratch” will create a directory “/scratch” with “artifacts” missing. When running my model, it will create a different session at the default “/tmp/ray” which contains “artifacts” directory. This session is different from the one in the /scratch, i.e. there are two different sessions created, one before running the model, the other after running the model.