running ray’s helm chart on K8s - how does the ray debug command work? I’ve looked at the code, but I am unable to pinpoint the port its trying to connect to when a workers hits the breakpoint. Do I need to open another port on the workers container, that is not configured in the official helm chart, so that the ray head node can connect to the workers?
ray@example-cluster-ray-head-type-mccvg:~$ ray debug
2021-06-25 05:39:51,804 INFO scripts.py:206 -- Connecting to Ray instance at 10.53.8.2:6379.
2021-06-25 05:39:51,805 INFO worker.py:727 -- Connecting to existing Ray cluster at address: 10.53.8.2:6379
Active breakpoints:
0: ray::RayServeWrappedReplica.handle_request() | /app/app/main.py:18
NoneType: None
Enter breakpoint index or press enter to refresh: 0
Traceback (most recent call last):
File "/home/ray/anaconda3/bin/ray", line 8, in <module>
sys.exit(main())
File "/home/ray/anaconda3/lib/python3.7/site-packages/ray/scripts/scripts.py", line 1808, in main
return cli()
File "/home/ray/anaconda3/lib/python3.7/site-packages/click/core.py", line 1137, in __call__
return self.main(*args, **kwargs)
File "/home/ray/anaconda3/lib/python3.7/site-packages/click/core.py", line 1062, in main
rv = self.invoke(ctx)
File "/home/ray/anaconda3/lib/python3.7/site-packages/click/core.py", line 1668, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File "/home/ray/anaconda3/lib/python3.7/site-packages/click/core.py", line 1404, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/home/ray/anaconda3/lib/python3.7/site-packages/click/core.py", line 763, in invoke
return __callback(*args, **kwargs)
File "/home/ray/anaconda3/lib/python3.7/site-packages/ray/scripts/scripts.py", line 231, in debug
ray.util.rpdb.connect_pdb_client(host, int(port))
File "/home/ray/anaconda3/lib/python3.7/site-packages/ray/util/rpdb.py", line 242, in connect_pdb_client
s.connect((host, port))
ConnectionRefusedError: [Errno 111] Connection refused
I can also confirm the bug for k8s deployment in the nightly py38 gpu container (pulled 3/28/22). Can I ask what ports need to be opened and how we can open them manually?
base) ray@ray-cluster-1-ray-head-type-2m9pl:~$ ray debug
2022-03-30 05:42:52,739 INFO scripts.py:200 -- Connecting to Ray instance at 10.42.199.202:6379.
2022-03-30 05:42:52,739 INFO worker.py:946 -- Connecting to existing Ray cluster at address: 10.42.199.202:6379
Active breakpoints:
index | timestamp | Ray task | filename:lineno
0 | 2022-03-30 12:30:27 | ray::BaseWorkerMixin._BaseWorkerMixin__execute() | /tmp/ray/session_2022-03-27_06-31-01_132838_57778/runtime_resources/working_dir_files/_ray_pkg_e861ad26ebf05b45/utils.py:333
Enter breakpoint index or press enter to refresh: 0
Traceback (most recent call last):
File "/home/ray/anaconda3/bin/ray", line 8, in <module>
sys.exit(main())
File "/home/ray/anaconda3/lib/python3.8/site-packages/ray/scripts/scripts.py", line 2269, in main
return cli()
File "/home/ray/anaconda3/lib/python3.8/site-packages/click/core.py", line 1128, in __call__
return self.main(*args, **kwargs)
File "/home/ray/anaconda3/lib/python3.8/site-packages/click/core.py", line 1053, in main
rv = self.invoke(ctx)
File "/home/ray/anaconda3/lib/python3.8/site-packages/click/core.py", line 1659, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File "/home/ray/anaconda3/lib/python3.8/site-packages/click/core.py", line 1395, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/home/ray/anaconda3/lib/python3.8/site-packages/click/core.py", line 754, in invoke
return __callback(*args, **kwargs)
File "/home/ray/anaconda3/lib/python3.8/site-packages/ray/scripts/scripts.py", line 259, in debug
ray.util.rpdb.connect_pdb_client(host, int(port))
File "/home/ray/anaconda3/lib/python3.8/site-packages/ray/util/rpdb.py", line 326, in connect_pdb_client
s.connect((host, port))
ConnectionRefusedError: [Errno 111] Connection refused
(base) ray@ray-cluster-1-ray-head-type-2m9pl:~$