Share access to Ray with other users

bzs · January 15, 2021, 12:37am

I want to start a ray head node on an on-prem Linux machine, connect multiple nodes (also machines on-prem) to it, and allow my coworkers access to Ray. We all have user account on the machine running the head node, and they belong to the same group. What I eneded up doing to allow other to access ray is changing permission of /tmp/ray/session_2021-01-13_19-38-03_898565_24443/sockets/plasma_store and /tmp/ray/session_2021-01-13_19-38-03_898565_24443/sockets/raylet to be group writable. Is this the “proper” way of doing this? would Ray Cluster Launcher help me in this use case?

sangcho · January 16, 2021, 3:20am

Why can’t just each of you guys run drivers with ray.init(address=‘auto;’)?

bzs · January 17, 2021, 11:35pm

I thought that would work too, but alas. I started a ray process on the non-head nodes by

ray start --address <ip_of_head_node>

If I start a python REPL on the non-head node and run ray.init(address='auto'), the process just hangs & eventually gives me a backtrace. I had 6379 open on the head node for Redis (head node started with --port 6379). What other ports need to be open on machines in the cluster in order for this to work?

sangcho · January 18, 2021, 7:09am

So are you saying if you try running a driver in non-head node, it is crashed? Can you explain your env a bit more? This is not supposed to happen normally.

sangcho · January 18, 2021, 7:10am

Also here is the information about the port number; Configuring Ray — Ray v1.1.0

bzs · January 21, 2021, 10:03am

@sangcho thanks for the link, I followed it and made sure ports are open on the head node (IP 10.70.21.30) by

sudo ufw allow 6379:20000/tcp
sudo ufw allow 6379:20000/udp

and started the head node with

ray start --head --dashboard-host 0.0.0.0 --include-dashboard true --dashboard-port 8265 --gcs-server-port 6380 --node-manager-port 6381 --object-manager-port 6382

and on the regular node I did

 ray start --address 10.70.21.30:6379

which seem like it succeeded. But if I subsequently run ray status, it hangs at

➜ ray status
2021-01-21 02:07:06,676 INFO scripts.py:1355 -- Connecting to Ray instance at 10.70.21.30:6379.
2021-01-21 02:07:06,683 INFO worker.py:650 -- Connecting to existing Ray cluster at address: 10.70.21.30:6379

and same thing when I do ray.start(address='auto') in a script.

In the mean time, I made sure that the node can reach head on these ports:

➜ nmap -p 6379-6382 rl-lambda-1
Starting Nmap 7.80 ( https://nmap.org ) at 2021-01-21 02:12 PST
Nmap scan report for rl-lambda-1 (10.70.21.30)
Host is up (0.073s latency).

PORT     STATE SERVICE
6379/tcp open  redis
6380/tcp open  unknown
6381/tcp open  unknown
6382/tcp open  metatude-mds

Am I missing something?

Both nodes are running ray 1.0.1.post1 and Python 3.8.6; head OS is Ubuntu 18, the other is Manjaro

Topic		Replies	Views
Having trouble connecting to head node Ray Clusters	14	6005	April 27, 2022
How to connect to Ray cluster? Kubernetes	7	771	July 13, 2021
Unable to connect to head node Ray Clusters	4	782	July 12, 2022
Share the Ray cluster Ray Core	4	578	February 8, 2021
Worker node workers/cores aren't not working	1	598	May 2, 2022

Share access to Ray with other users

Related topics