Node_ip_address.json not found

Hi, sorry if this issue was solved in previous post i did search but could not find anything related with sucessful outcome…

here the deal somehow when i run my script it cannot find the .json but in the path inside the error message everything match and when checking on my server the file is indeed present… anyone got an idea of something to tryout ? i will pour as much information as possible. Thanks!

Ray version : 2.8.1 on all systems

L1 = 2 server(CentOS9) and 1 computer (Windows10)
L2 = everything is on the same network nothing fancy

command to start the ray cluster server side :

ray start --head --port=6379 

on the second server i start with :

 ray start --address="HEAD_IP":6379

here the resulting status :

[mike@node-1w7jra83c7kv8r2xl6dpyqyeg Ray]$ ray status
======== Autoscaler status: 2023-12-06 12:37:33.841051 ========
Node status
 1 node_5f85de84d5c82854568a1a706ebf9d0b49b7125b6981fd1355ba8d62
 1 node_f534ea8f259a8dd31fde064768f46c116d96af463c9a09938f2f6a50
 (no pending nodes)
Recent failures:
 (no failures)

 0.0/88.0 CPU
 0B/186.03GiB memory
 0B/83.72GiB object_store_memory

 (no resource demands)

the pool is correctly created with all the allocated CPU detected. I then go to my windows machine and try to connect to the RAY cluster with this simple script :

import ray


i get this error :

2023-12-06 14:36:21,842 INFO -- Can't find a `node_ip_address.json` file from /tmp/ray\session_2023-12-06_12-05-14_047673_11438. Have you started Ray instsance using `ray start` or `ray.init`?

all the folder in the path list has been manually chmod to 0777. this is the list of file inside the dir ray:

[mike@node-1w7jra83c7kv6mh9ip6kg0lxv session_2023-12-06_12-05-14_047673_11438]$ ls
logs                  node_ip_address.json.lock  ports_by_node.json.lock  sockets
node_ip_address.json  ports_by_node.json         runtime_resources        usage_stats.json

this si the result of the netconnection on the windows machine:

PS C:\Users\Op3th> Test-NetConnection -ComputerName HEAD_IP -Port 6379

ComputerName     : HEAD_IP
RemoteAddress    : HEAD_IP
RemotePort       : 6379
InterfaceAlias   : Ethernet
SourceAddress    : WINDOWS_IP
TcpTestSucceeded : True

Im quiet at loss there…

could it be that single \ in the path that kill the whole thing ? is there a way to switch it to / to test ? (i did look a bit into the sysfile to try and find the path variable for the node_adress_ip.json read to try and hardcode it but i could not find it…)

thanks for any response!

up, if anyone has any idea of anything or know where i could hardcode the path just to erase that hypothesis that would be appreciated

Hi @Op3th

Since the windows machine is not part of the Ray cluster, you cannot do that. You can try Ray job submission to submit jobs to the remote cluster: Ray Jobs Overview — Ray 2.8.1

1 Like