How can I specify the port number of health check?

I have two windows servers (192.168.1.11 and 192.168.1.12) and try to run a Ray Docker container (image tag = 2.35.0-py312-gpu) on each server.

Steps

  1. I run these two commands to start the Ray process. I confirm 192.168.1.11:8265 (the dashboard) shows the worker node (192.168.1.12).
# Run this in 192.168.1.11
$ ray start --head --dashboard-host=0.0.0.0
# Run this in 192.168.1.12
$ ray start --address=192.168.1.11:6379 --node-ip-address=192.168.1.12
  1. However, about 30 seconds after I complete Step 1, the status of the worker node becomes DEAD.

  2. I find gcs_server.out has these lines below. It seems that the head node fails to access 192.168.1.12:39091.

[2024-09-13 04:23:52,090 W 2925 2925] (gcs_server) gcs_health_check_manager.cc:108: Health check failed for node f7d09b9af5a7100e0376fad74db65ce7189372757f494c4525d6f147, remaining checks 4, status 4, response status 0, status message Deadline Exceeded, status details
[2024-09-13 04:23:57,115 W 2925 2925] (gcs_server) gcs_health_check_manager.cc:108: Health check failed for node f7d09b9af5a7100e0376fad74db65ce7189372757f494c4525d6f147, remaining checks 3, status 14, response status 0, status message failed to connect to all addresses; last error: UNKNOWN: ipv4:192.168.1.12:39091: Failed to connect to remote host: FD Shutdown, status details
[2024-09-13 04:24:00,115 W 2925 2925] (gcs_server) gcs_health_check_manager.cc:108: Health check failed for node f7d09b9af5a7100e0376fad74db65ce7189372757f494c4525d6f147, remaining checks 2, status 14, response status 0, status message failed to connect to all addresses; last error: UNKNOWN: ipv4:192.168.1.12:39091: Failed to connect to remote host: FD Shutdown, status details
[2024-09-13 04:24:03,116 W 2925 2925] (gcs_server) gcs_health_check_manager.cc:108: Health check failed for node f7d09b9af5a7100e0376fad74db65ce7189372757f494c4525d6f147, remaining checks 1, status 14, response status 0, status message failed to connect to all addresses; last error: UNKNOWN: ipv4:192.168.1.12:39091: Failed to connect to remote host: FD Shutdown, status details
[2024-09-13 04:24:06,116 W 2925 2925] (gcs_server) gcs_health_check_manager.cc:108: Health check failed for node f7d09b9af5a7100e0376fad74db65ce7189372757f494c4525d6f147, remaining checks 0, status 14, response status 0, status message failed to connect to all addresses; last error: UNKNOWN: ipv4:192.168.1.12:39091: Failed to connect to remote host: FD Shutdown, status details

Problem

The problem is the port number (39091 in 192.168.1.12:39091) changes every time and I don’t find any method to specify this port here (Configuring Ray — Ray 2.38.0), while I need to know which port to be used in advance in order to set up Windows Defender Firewall and Docker’s -p option.

Is there a good way to solve this problem?

IIUC you need to specify raylet port for 192.168.1.12? I think you can use --node-manager-port for it.

https://docs.ray.io/en/latest/ray-core/configure.html#ports-configurations