I keep getting this error on large clusters even if include-dashboard is false and ports are static.
If I recall correctly, I wasn’t getting this when on version 1.2.0
Cluster yaml config:
head_start_ray_commands:
- ray stop
- >-
ulimit -n 65536;
ray start
--head
--port=6379
--redis-password="${REDIS_PASSWORD}"
--autoscaling-config="${HOME}/ray_bootstrap_config.yaml"
--gcs-server-port=6006
--node-manager-port=6007
--object-manager-port=6008
--redis-shard-ports=6400,6401,6402,6403,6404,6405,6406,6407,6408,6409
--min-worker-port=35000
--max-worker-port=40000
--include-dashboard=false
worker_start_ray_commands:
- ray stop
- >-
ulimit -n 65536;
ray start
--address="${RAY_HEAD_IP}:6379"
--redis-password="${REDIS_PASSWORD}"
--node-manager-port=6007
--object-manager-port=6008
--min-worker-port=35000
--max-worker-port=40000
Link to this case [Dashboard] New dashboard port errors in a large cluster. · Issue #11638 · ray-project/ray · GitHub
Where is port 51660
coming from?
(raylet, ip=10.30.20.72) E0511 15:49:29.016917590 2299789 server_chttp2.cc:40] {"created":"@1620748169.016836834","description":"No address added out of total 1 resolved","file":"src/core/ext/transport/chttp2/server/chttp2_server.cc","file_line":306,"referenced_errors":[{"created":"@1620748169.016828783","description":"Failed to add any wildcard listeners","file":"src/core/lib/iomgr/tcp_server_posix.cc","file_line":340,"referenced_errors":[{"created":"@1620748169.016815497","description":"Unable to configure socket","fd":18,"file":"src/core/lib/iomgr/tcp_server_utils_posix_common.cc","file_line":214,"referenced_errors":[{"created":"@1620748169.016811409","description":"Address already in use","errno":98,"file":"src/core/lib/iomgr/tcp_server_utils_posix_common.cc","file_line":188,"os_error":"Address already in use","syscall":"bind"}]},{"created":"@1620748169.016827859","description":"Unable to configure socket","fd":18,"file":"src/core/lib/iomgr/tcp_server_utils_posix_common.cc","file_line":214,"referenced_errors":[{"created":"@1620748169.016825168","description":"Address already in use","errno":98,"file":"src/core/lib/iomgr/tcp_server_utils_posix_common.cc","file_line":188,"os_error":"Address already in use","syscall":"bind"}]}]}]}
(raylet, ip=10.30.20.72) Traceback (most recent call last):
(raylet, ip=10.30.20.72) File "/usr/local/lib/python3.8/site-packages/ray/new_dashboard/agent.py", line 338, in <module>
(raylet, ip=10.30.20.72) raise e
(raylet, ip=10.30.20.72) File "/usr/local/lib/python3.8/site-packages/ray/new_dashboard/agent.py", line 313, in <module>
(raylet, ip=10.30.20.72) agent = DashboardAgent(
(raylet, ip=10.30.20.72) File "/usr/local/lib/python3.8/site-packages/ray/new_dashboard/agent.py", line 74, in __init__
(raylet, ip=10.30.20.72) self.grpc_port = self.server.add_insecure_port(
(raylet, ip=10.30.20.72) File "/usr/local/lib/python3.8/site-packages/grpc/aio/_server.py", line 83, in add_insecure_port
(raylet, ip=10.30.20.72) return _common.validate_port_binding_result(
(raylet, ip=10.30.20.72) File "/usr/local/lib/python3.8/site-packages/grpc/_common.py", line 166, in validate_port_binding_result
(raylet, ip=10.30.20.72) raise RuntimeError(_ERROR_MESSAGE_PORT_BINDING_FAILED % address)
(raylet, ip=10.30.20.72) RuntimeError: Failed to bind to address [::]:51660; set GRPC_VERBOSITY=debug environment variable to see detailed error message.