Using ray debug with the official helm chart

Hi!

running ray’s helm chart on K8s - how does the ray debug command work? I’ve looked at the code, but I am unable to pinpoint the port its trying to connect to when a workers hits the breakpoint. Do I need to open another port on the workers container, that is not configured in the official helm chart, so that the ray head node can connect to the workers?

ray@example-cluster-ray-head-type-mccvg:~$ ray debug
2021-06-25 05:39:51,804 INFO scripts.py:206 -- Connecting to Ray instance at 10.53.8.2:6379.
2021-06-25 05:39:51,805 INFO worker.py:727 -- Connecting to existing Ray cluster at address: 10.53.8.2:6379
Active breakpoints:
0: ray::RayServeWrappedReplica.handle_request() | /app/app/main.py:18
NoneType: None
Enter breakpoint index or press enter to refresh: 0
Traceback (most recent call last):
  File "/home/ray/anaconda3/bin/ray", line 8, in <module>
    sys.exit(main())
  File "/home/ray/anaconda3/lib/python3.7/site-packages/ray/scripts/scripts.py", line 1808, in main
    return cli()
  File "/home/ray/anaconda3/lib/python3.7/site-packages/click/core.py", line 1137, in __call__
    return self.main(*args, **kwargs)
  File "/home/ray/anaconda3/lib/python3.7/site-packages/click/core.py", line 1062, in main
    rv = self.invoke(ctx)
  File "/home/ray/anaconda3/lib/python3.7/site-packages/click/core.py", line 1668, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/home/ray/anaconda3/lib/python3.7/site-packages/click/core.py", line 1404, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/home/ray/anaconda3/lib/python3.7/site-packages/click/core.py", line 763, in invoke
    return __callback(*args, **kwargs)
  File "/home/ray/anaconda3/lib/python3.7/site-packages/ray/scripts/scripts.py", line 231, in debug
    ray.util.rpdb.connect_pdb_client(host, int(port))
  File "/home/ray/anaconda3/lib/python3.7/site-packages/ray/util/rpdb.py", line 242, in connect_pdb_client
    s.connect((host, port))
ConnectionRefusedError: [Errno 111] Connection refused

Thank you for pointing this out.

This looks like a bug — the relevant ports should be exposed by default.
@pcmoritz any additional insight?

Tracking here:

I can also confirm the bug for k8s deployment in the nightly py38 gpu container (pulled 3/28/22). Can I ask what ports need to be opened and how we can open them manually?

base) ray@ray-cluster-1-ray-head-type-2m9pl:~$ ray debug
2022-03-30 05:42:52,739	INFO scripts.py:200 -- Connecting to Ray instance at 10.42.199.202:6379.
2022-03-30 05:42:52,739	INFO worker.py:946 -- Connecting to existing Ray cluster at address: 10.42.199.202:6379
Active breakpoints:
index | timestamp           | Ray task                                         | filename:lineno                                                                                                             
0     | 2022-03-30 12:30:27 | ray::BaseWorkerMixin._BaseWorkerMixin__execute() | /tmp/ray/session_2022-03-27_06-31-01_132838_57778/runtime_resources/working_dir_files/_ray_pkg_e861ad26ebf05b45/utils.py:333
Enter breakpoint index or press enter to refresh: 0
Traceback (most recent call last):
  File "/home/ray/anaconda3/bin/ray", line 8, in <module>
    sys.exit(main())
  File "/home/ray/anaconda3/lib/python3.8/site-packages/ray/scripts/scripts.py", line 2269, in main
    return cli()
  File "/home/ray/anaconda3/lib/python3.8/site-packages/click/core.py", line 1128, in __call__
    return self.main(*args, **kwargs)
  File "/home/ray/anaconda3/lib/python3.8/site-packages/click/core.py", line 1053, in main
    rv = self.invoke(ctx)
  File "/home/ray/anaconda3/lib/python3.8/site-packages/click/core.py", line 1659, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/home/ray/anaconda3/lib/python3.8/site-packages/click/core.py", line 1395, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/home/ray/anaconda3/lib/python3.8/site-packages/click/core.py", line 754, in invoke
    return __callback(*args, **kwargs)
  File "/home/ray/anaconda3/lib/python3.8/site-packages/ray/scripts/scripts.py", line 259, in debug
    ray.util.rpdb.connect_pdb_client(host, int(port))
  File "/home/ray/anaconda3/lib/python3.8/site-packages/ray/util/rpdb.py", line 326, in connect_pdb_client
    s.connect((host, port))
ConnectionRefusedError: [Errno 111] Connection refused
(base) ray@ray-cluster-1-ray-head-type-2m9pl:~$ 

I will follow up on the GitHub issue.