I have installed python 3.7.9 and ray inside a docker container. I am trying to connect through docker to a different machine and run a python program with ray on it.
Command used to run docker :-
sudo docker run -dit -p 3000:3000 -p 8265:8265 -p 6374:6374 -p 6379:6379 -p 6380:6380 -p 8000:8000 -p 8888:8888 -p 50050:50050 -p 50051:50051 -p 4000:4000 -p 4001:4001 -p 4002:4002 -p 4003:4003 -p 4040:4040 -p 10000-10200:10000-10200 -p 10201-10300:10201-10300 -p 37280:37280 -p 36458:36458 -p 38251:38251 -p 41091:41091 -p 44217:44217 -p 55711:55711 -p 58331:58331 -p 63084:63084 -p 63246:63246 -p 57454:57454 -p 63313:63313 -p 60504:60504 --shm-size=204.89gb --name nikunjJUL20 pyray tail -f /dev/null
I have mapped all the ports in the docker which I thought necessary for running ray.
I am making one machine a head node with 0 worker nodes
Command used to run ray :-
> ray start --head --node-ip-address --port 10275 --dashboard-host 0.0.0.0 --dashboard-port 8265 --object-manager-port 4000 --node-manager-port 4001 --min-worker-port 10002 --max-worker-port 10042 --ray-client-server-port 10250 --gcs-server-port 4003 --num-cpus 0 --redis-shard-ports 10201
The ray starts as following
> Local node IP:
> 2021-07-21 12:12:52,653 INFO services.py:1274 – View the Ray dashboard at http://172.17.0.2:8265
**> **
> --------------------
> Ray runtime started.
> --------------------
**> **
> Next steps
> To connect to this Ray runtime from another node, run
> ray start --address=‘<ip of the machine :10275’ --redis-password=‘5241590000000000’
**> **
> Alternatively, use the following Python code:
> import ray
> ray.init(address=‘auto’, _redis_password=‘5241590000000000’)
**> **
> If connection fails, check your firewall settings and network configuration.
**> **
> To terminate the Ray runtime, run
> ray stop
But as I checked the dashboard is not opening as its not able to get a node properly and the ray.init() is also not working. When I run the ray on the other machine and try to attach to this cluster then It also gets fail.
When I run following command, I get the error as follows
> ray debug
2021-07-21 12:22:53,160 INFO scripts.py:206 – Connecting to Ray instance at 172.23.10.111:10275.
2021-07-21 12:22:53,161 INFO worker.py:736 – Connecting to existing Ray cluster at address: 172.23.10.111:10275
Active breakpoints:
Enter breakpoint index or press enter to refresh: 2021-07-21 12:22:54,889 WARNING worker.py:1123 – The agent on node 648518b3d714 failed with the following error:
Traceback (most recent call last):
File “/usr/local/lib/python3.7/site-packages/ray/new_dashboard/agent.py”, line 326, in
loop.run_until_complete(agent.run())
File “/usr/local/lib/python3.7/asyncio/base_events.py”, line 587, in run_until_complete
return future.result()
File “/usr/local/lib/python3.7/site-packages/ray/new_dashboard/agent.py”, line 161, in run
await site.start()
File “/usr/local/lib/python3.7/site-packages/aiohttp/web_runner.py”, line 128, in start
reuse_port=self._reuse_port,
File “/usr/local/lib/python3.7/asyncio/base_events.py”, line 1389, in create_server
% (sa, err.strerror.lower())) from None
OSError: [Errno 99] error while attempting to bind on address (‘IP of the machine’, 0): cannot assign requested address
As I can see its getting a random port for some node but I think I have not made that available. I cannot map all the ports of the host to the docker. Please help me solve this issue.