I’m able to run ray up cluster.yaml and the cluster head node starts (image is downloaded and head container is initialized. I can connect to dashboard and see one worker ready to handle tasks (172.17.1.99), however no sight of 99 or 93 nodes… I guess one should expect them being visible (at least one because of min_workers == max_workers == initial_workers entries in config…) - am I right?
Can anyone provide some hints? During the ray up command I can see log messages related only to head node IP - no other workers IP. SSH login with keys is configured properly. No errors during the ray up cluster.yaml.
B) and the dashboard view (in second post due to new users limitations - I can upload only one media per post)
As far as I understand, ray does not have to be installed on worker nodes - only Docker is required and the whole calculation processes are handled by the image specified in the YAML config? Anyway, while starting the cluster no try to connect to worker_ips is visible as well as while running any calculations - only the head is utilized.
I’m also trying to get started with running Ray/RLlib on a local cluster (see other thread) and am currently stuck at the same point:
I run ray up cluster.yaml on my laptop and it completes without errors but the dashboard only shows the head node.
When training an RL agent, it’s also only performed on the head node (according to the Ray dashboard and htop running on all cluster nodes).
I did not use the docker option though. Instead, I manually installed ray 1.2.0 and my custom environment on all machines of the cluster.
@sebzur Any news on this? Did you resolve the issue somehow?
Hi @sebzur , sorry same for me - I have been busy with lots of other things in the last months and have not yet gotten around to this. Also, the issue isn’t so relevant for me anymore, so I probably won’t have any time to debug it any time soon.
Still, would be cool to hear if you have any updates/solutions.