Share access to Ray with other users

I want to start a ray head node on an on-prem Linux machine, connect multiple nodes (also machines on-prem) to it, and allow my coworkers access to Ray. We all have user account on the machine running the head node, and they belong to the same group. What I eneded up doing to allow other to access ray is changing permission of /tmp/ray/session_2021-01-13_19-38-03_898565_24443/sockets/plasma_store and /tmp/ray/session_2021-01-13_19-38-03_898565_24443/sockets/raylet to be group writable. Is this the “proper” way of doing this? would Ray Cluster Launcher help me in this use case?

Why can’t just each of you guys run drivers with ray.init(address=‘auto;’)?

I thought that would work too, but alas. I started a ray process on the non-head nodes by

ray start --address <ip_of_head_node>

If I start a python REPL on the non-head node and run ray.init(address='auto'), the process just hangs & eventually gives me a backtrace. I had 6379 open on the head node for Redis (head node started with --port 6379). What other ports need to be open on machines in the cluster in order for this to work?

So are you saying if you try running a driver in non-head node, it is crashed? Can you explain your env a bit more? This is not supposed to happen normally.

Also here is the information about the port number; Configuring Ray — Ray v1.1.0

@sangcho thanks for the link, I followed it and made sure ports are open on the head node (IP 10.70.21.30) by

sudo ufw allow 6379:20000/tcp
sudo ufw allow 6379:20000/udp

and started the head node with

ray start --head --dashboard-host 0.0.0.0 --include-dashboard true --dashboard-port 8265 --gcs-server-port 6380 --node-manager-port 6381 --object-manager-port 6382

and on the regular node I did

 ray start --address 10.70.21.30:6379

which seem like it succeeded. But if I subsequently run ray status, it hangs at

➜ ray status
2021-01-21 02:07:06,676 INFO scripts.py:1355 -- Connecting to Ray instance at 10.70.21.30:6379.
2021-01-21 02:07:06,683 INFO worker.py:650 -- Connecting to existing Ray cluster at address: 10.70.21.30:6379

and same thing when I do ray.start(address='auto') in a script.

In the mean time, I made sure that the node can reach head on these ports:

➜ nmap -p 6379-6382 rl-lambda-1
Starting Nmap 7.80 ( https://nmap.org ) at 2021-01-21 02:12 PST
Nmap scan report for rl-lambda-1 (10.70.21.30)
Host is up (0.073s latency).

PORT     STATE SERVICE
6379/tcp open  redis
6380/tcp open  unknown
6381/tcp open  unknown
6382/tcp open  metatude-mds

Am I missing something?

Both nodes are running ray 1.0.1.post1 and Python 3.8.6; head OS is Ubuntu 18, the other is Manjaro