Getting dashboard agent errors even with dashboard disabled

I keep getting this error on large clusters even if include-dashboard is false and ports are static.

If I recall correctly, I wasn’t getting this when on version 1.2.0

Cluster yaml config:

    head_start_ray_commands:
      - ray stop
      - >-
        ulimit -n 65536;
        ray start
        --head
        --port=6379
        --redis-password="${REDIS_PASSWORD}"
        --autoscaling-config="${HOME}/ray_bootstrap_config.yaml"
        --gcs-server-port=6006
        --node-manager-port=6007
        --object-manager-port=6008
        --redis-shard-ports=6400,6401,6402,6403,6404,6405,6406,6407,6408,6409
        --min-worker-port=35000
        --max-worker-port=40000
        --include-dashboard=false

    worker_start_ray_commands:
      - ray stop
      - >-
        ulimit -n 65536;
        ray start
        --address="${RAY_HEAD_IP}:6379"
        --redis-password="${REDIS_PASSWORD}"
        --node-manager-port=6007
        --object-manager-port=6008
        --min-worker-port=35000
        --max-worker-port=40000

Link to this case [Dashboard] New dashboard port errors in a large cluster. · Issue #11638 · ray-project/ray · GitHub

Where is port 51660 coming from?

(raylet, ip=10.30.20.72) E0511 15:49:29.016917590 2299789 server_chttp2.cc:40]        {"created":"@1620748169.016836834","description":"No address added out of total 1 resolved","file":"src/core/ext/transport/chttp2/server/chttp2_server.cc","file_line":306,"referenced_errors":[{"created":"@1620748169.016828783","description":"Failed to add any wildcard listeners","file":"src/core/lib/iomgr/tcp_server_posix.cc","file_line":340,"referenced_errors":[{"created":"@1620748169.016815497","description":"Unable to configure socket","fd":18,"file":"src/core/lib/iomgr/tcp_server_utils_posix_common.cc","file_line":214,"referenced_errors":[{"created":"@1620748169.016811409","description":"Address already in use","errno":98,"file":"src/core/lib/iomgr/tcp_server_utils_posix_common.cc","file_line":188,"os_error":"Address already in use","syscall":"bind"}]},{"created":"@1620748169.016827859","description":"Unable to configure socket","fd":18,"file":"src/core/lib/iomgr/tcp_server_utils_posix_common.cc","file_line":214,"referenced_errors":[{"created":"@1620748169.016825168","description":"Address already in use","errno":98,"file":"src/core/lib/iomgr/tcp_server_utils_posix_common.cc","file_line":188,"os_error":"Address already in use","syscall":"bind"}]}]}]}
(raylet, ip=10.30.20.72) Traceback (most recent call last):
(raylet, ip=10.30.20.72)   File "/usr/local/lib/python3.8/site-packages/ray/new_dashboard/agent.py", line 338, in <module>
(raylet, ip=10.30.20.72)     raise e
(raylet, ip=10.30.20.72)   File "/usr/local/lib/python3.8/site-packages/ray/new_dashboard/agent.py", line 313, in <module>
(raylet, ip=10.30.20.72)     agent = DashboardAgent(
(raylet, ip=10.30.20.72)   File "/usr/local/lib/python3.8/site-packages/ray/new_dashboard/agent.py", line 74, in __init__
(raylet, ip=10.30.20.72)     self.grpc_port = self.server.add_insecure_port(
(raylet, ip=10.30.20.72)   File "/usr/local/lib/python3.8/site-packages/grpc/aio/_server.py", line 83, in add_insecure_port
(raylet, ip=10.30.20.72)     return _common.validate_port_binding_result(
(raylet, ip=10.30.20.72)   File "/usr/local/lib/python3.8/site-packages/grpc/_common.py", line 166, in validate_port_binding_result
(raylet, ip=10.30.20.72)     raise RuntimeError(_ERROR_MESSAGE_PORT_BINDING_FAILED % address)
(raylet, ip=10.30.20.72) RuntimeError: Failed to bind to address [::]:51660; set GRPC_VERBOSITY=debug environment variable to see detailed error message.

Here’s another error but with another port…

TLDR: "os_error":"Address already in use"

(raylet, ip=10.30.20.72) E0514 20:05:43.081159399 2831251 server_chttp2.cc:40]        {"created":"@1621022743.081087369","description":"No address added out of total 1 resolved","file":"src/core/ext/transport/chttp2/server/chttp2_server.cc","file_line":306,"referenced_errors":[{"created":"@1621022743.081075995","description":"Failed to add any wildcard listeners","file":"src/core/lib/iomgr/tcp_server_posix.cc","file_line":340,"referenced_errors":[{"created":"@1621022743.081060934","description":"Unable to configure socket","fd":18,"file":"src/core/lib/iomgr/tcp_server_utils_posix_common.cc","file_line":214,"referenced_errors":[{"created":"@1621022743.081056479","description":"Address already in use","errno":98,"file":"src/core/lib/iomgr/tcp_server_utils_posix_common.cc","file_line":188,"os_error":"Address already in use","syscall":"bind"}]},{"created":"@1621022743.081074929","description":"Unable to configure socket","fd":18,"file":"src/core/lib/iomgr/tcp_server_utils_posix_common.cc","file_line":214,"referenced_errors":[{"created":"@1621022743.081071797","description":"Address already in use","errno":98,"file":"src/core/lib/iomgr/tcp_server_utils_posix_common.cc","file_line":188,"os_error":"Address already in use","syscall":"bind"}]}]}]}
(raylet, ip=10.30.20.72) Traceback (most recent call last):
(raylet, ip=10.30.20.72)   File "/usr/local/lib/python3.8/site-packages/ray/new_dashboard/agent.py", line 338, in <module>
(raylet, ip=10.30.20.72)     raise e
(raylet, ip=10.30.20.72)   File "/usr/local/lib/python3.8/site-packages/ray/new_dashboard/agent.py", line 313, in <module>
(raylet, ip=10.30.20.72)     agent = DashboardAgent(
(raylet, ip=10.30.20.72)   File "/usr/local/lib/python3.8/site-packages/ray/new_dashboard/agent.py", line 74, in __init__
(raylet, ip=10.30.20.72)     self.grpc_port = self.server.add_insecure_port(
(raylet, ip=10.30.20.72)   File "/usr/local/lib/python3.8/site-packages/grpc/aio/_server.py", line 83, in add_insecure_port
(raylet, ip=10.30.20.72)     return _common.validate_port_binding_result(
(raylet, ip=10.30.20.72)   File "/usr/local/lib/python3.8/site-packages/grpc/_common.py", line 166, in validate_port_binding_result
(raylet, ip=10.30.20.72)     raise RuntimeError(_ERROR_MESSAGE_PORT_BINDING_FAILED % address)
(raylet, ip=10.30.20.72) RuntimeError: Failed to bind to address [::]:51822; set GRPC_VERBOSITY=debug environment variable to see detailed error message.

When calling netstat, there is no open port 51822

# netstat -lntu
Active Internet connections (only servers)
Proto Recv-Q Send-Q Local Address           Foreign Address         State      
tcp        0      0 127.0.0.1:40479         0.0.0.0:*               LISTEN     
tcp        0      0 127.0.0.1:20257         0.0.0.0:*               LISTEN     
tcp        0      0 0.0.0.0:22              0.0.0.0:*               LISTEN     
tcp6       0      0 :::12058                :::*                    LISTEN     
tcp6       0      0 :::12026                :::*                    LISTEN     
tcp6       0      0 :::12059                :::*                    LISTEN     
tcp6       0      0 :::12027                :::*                    LISTEN     
tcp6       0      0 :::12060                :::*                    LISTEN     
tcp6       0      0 :::12028                :::*                    LISTEN     
tcp6       0      0 :::12061                :::*                    LISTEN     
tcp6       0      0 :::12029                :::*                    LISTEN     
tcp6       0      0 :::12062                :::*                    LISTEN     
tcp6       0      0 :::12030                :::*                    LISTEN     
tcp6       0      0 :::12031                :::*                    LISTEN     
tcp6       0      0 :::12032                :::*                    LISTEN     
tcp6       0      0 :::12000                :::*                    LISTEN     
tcp6       0      0 :::12033                :::*                    LISTEN     
tcp6       0      0 :::12001                :::*                    LISTEN     
tcp6       0      0 :::12034                :::*                    LISTEN     
tcp6       0      0 :::12002                :::*                    LISTEN     
tcp6       0      0 :::12035                :::*                    LISTEN     
tcp6       0      0 :::12003                :::*                    LISTEN     
tcp6       0      0 :::12036                :::*                    LISTEN     
tcp6       0      0 :::12004                :::*                    LISTEN     
tcp6       0      0 :::12037                :::*                    LISTEN     
tcp6       0      0 :::12005                :::*                    LISTEN     
tcp6       0      0 :::12038                :::*                    LISTEN     
tcp6       0      0 :::12006                :::*                    LISTEN     
tcp6       0      0 :::12039                :::*                    LISTEN     
tcp6       0      0 :::12007                :::*                    LISTEN     
tcp6       0      0 :::12040                :::*                    LISTEN     
tcp6       0      0 :::12008                :::*                    LISTEN     
tcp6       0      0 :::12041                :::*                    LISTEN     
tcp6       0      0 :::12009                :::*                    LISTEN     
tcp6       0      0 :::12042                :::*                    LISTEN     
tcp6       0      0 :::12010                :::*                    LISTEN     
tcp6       0      0 :::12043                :::*                    LISTEN     
tcp6       0      0 :::12011                :::*                    LISTEN     
tcp6       0      0 :::12044                :::*                    LISTEN     
tcp6       0      0 :::12012                :::*                    LISTEN     
tcp6       0      0 :::8076                 :::*                    LISTEN     
tcp6       0      0 :::12045                :::*                    LISTEN     
tcp6       0      0 :::12013                :::*                    LISTEN     
tcp6       0      0 :::12046                :::*                    LISTEN     
tcp6       0      0 :::12014                :::*                    LISTEN     
tcp6       0      0 :::12047                :::*                    LISTEN     
tcp6       0      0 :::12015                :::*                    LISTEN     
tcp6       0      0 :::12048                :::*                    LISTEN     
tcp6       0      0 :::12016                :::*                    LISTEN     
tcp6       0      0 :::12049                :::*                    LISTEN     
tcp6       0      0 :::12017                :::*                    LISTEN     
tcp6       0      0 :::38257                :::*                    LISTEN     
tcp6       0      0 :::12050                :::*                    LISTEN     
tcp6       0      0 :::12018                :::*                    LISTEN     
tcp6       0      0 :::12051                :::*                    LISTEN     
tcp6       0      0 :::12019                :::*                    LISTEN     
tcp6       0      0 :::12052                :::*                    LISTEN     
tcp6       0      0 :::12020                :::*                    LISTEN     
tcp6       0      0 :::12053                :::*                    LISTEN     
tcp6       0      0 :::12021                :::*                    LISTEN     
tcp6       0      0 :::12054                :::*                    LISTEN     
tcp6       0      0 :::12022                :::*                    LISTEN     
tcp6       0      0 :::12055                :::*                    LISTEN     
tcp6       0      0 :::12023                :::*                    LISTEN     
tcp6       0      0 :::12056                :::*                    LISTEN     
tcp6       0      0 :::12024                :::*                    LISTEN     
tcp6       0      0 :::12057                :::*                    LISTEN     
tcp6       0      0 :::12025                :::*                    LISTEN     
udp        0      0 127.0.0.53:53           0.0.0.0:*                          
udp        0      0 10.30.20.72:68          0.0.0.0:*                          
udp        0      0 127.0.0.1:323           0.0.0.0:*                          
udp6       0      0 ::1:323                 :::*    

Is it happening consistently? Also, can you create an issue to our Github? It could be a bug.

Yes happens everytime. Here the issue Getting dashboard agent errors even with dashboard disabled · Issue #15854 · ray-project/ray · GitHub