Hi, sorry if this issue was solved in previous post i did search but could not find anything related with sucessful outcome…
here the deal somehow when i run my script it cannot find the .json but in the path inside the error message everything match and when checking on my server the file is indeed present… anyone got an idea of something to tryout ? i will pour as much information as possible. Thanks!
Ray version : 2.8.1 on all systems
L1 = 2 server(CentOS9) and 1 computer (Windows10)
L2 = everything is on the same network nothing fancy
command to start the ray cluster server side :
ray start --head --port=6379
on the second server i start with :
ray start --address="HEAD_IP":6379
here the resulting status :
[mike@node-1w7jra83c7kv8r2xl6dpyqyeg Ray]$ ray status
======== Autoscaler status: 2023-12-06 12:37:33.841051 ========
Node status
---------------------------------------------------------------
Active:
1 node_5f85de84d5c82854568a1a706ebf9d0b49b7125b6981fd1355ba8d62
1 node_f534ea8f259a8dd31fde064768f46c116d96af463c9a09938f2f6a50
Pending:
(no pending nodes)
Recent failures:
(no failures)
Resources
---------------------------------------------------------------
Usage:
0.0/88.0 CPU
0B/186.03GiB memory
0B/83.72GiB object_store_memory
Demands:
(no resource demands)
the pool is correctly created with all the allocated CPU detected. I then go to my windows machine and try to connect to the RAY cluster with this simple script :
import ray
ray.init(address='HEAD_IP:6379')
i get this error :
2023-12-06 14:36:21,842 INFO node.py:1000 -- Can't find a `node_ip_address.json` file from /tmp/ray\session_2023-12-06_12-05-14_047673_11438. Have you started Ray instsance using `ray start` or `ray.init`?
all the folder in the path list has been manually chmod to 0777. this is the list of file inside the dir ray:
[mike@node-1w7jra83c7kv6mh9ip6kg0lxv session_2023-12-06_12-05-14_047673_11438]$ ls
logs node_ip_address.json.lock ports_by_node.json.lock sockets
node_ip_address.json ports_by_node.json runtime_resources usage_stats.json
this si the result of the netconnection on the windows machine:
PS C:\Users\Op3th> Test-NetConnection -ComputerName HEAD_IP -Port 6379
ComputerName : HEAD_IP
RemoteAddress : HEAD_IP
RemotePort : 6379
InterfaceAlias : Ethernet
SourceAddress : WINDOWS_IP
TcpTestSucceeded : True
Im quiet at loss there…
could it be that single \ in the path that kill the whole thing ? is there a way to switch it to / to test ? (i did look a bit into the sysfile to try and find the path variable for the node_adress_ip.json read to try and hardcode it but i could not find it…)
thanks for any response!