`ray debug --address=<ip:port>` results in UnicodeDecodeError on k8s cluster

  • Medium: It contributes to significant difficulty to complete my task, but I can work around it.

Ray2.2 cluster on k8s

  • ray debug --address=<ip:port> results in UnicodeDecodeError on k8s cluster
  • works locally

Steps to reproduce:

  1. copy simple_task.py code into a file of the same name
  2. terminal 1: python simple_task.py
  3. terminal 2: ray debug --address=<the ip:port specified in terminal 1's stdout>

Using the example from the Ray docs:

simple_task.py

import ray
ray.init("ray://ray-examples-head-svc:10001")

@ray.remote
def f(x):
    breakpoint()
    return x * x

futures = [f.remote(i) for i in range(2)]
print(ray.get(futures))

terminal 1

(freenome-ray-examples-py3.10) ➜  ray-examples git:(rpx-3397-add-sens-spec) ✗ python ray_examples/simple_task.py
(f pid=4037, ip=10.192.6.5) RemotePdb session open at 10.192.6.5:44835, use 'ray debug' to connect...
(f pid=4036, ip=10.192.6.5) RemotePdb session open at 10.192.6.5:38031, use 'ray debug' to connect...

terminal 2

(freenome-ray-examples-py3.10) ➜  ray-examples git:(rpx-3397-add-sens-spec) ✗ ray debug --address=10.192.6.5:38031
2023-02-02 14:58:05,783 INFO scripts.py:206 -- Connecting to Ray instance at 10.192.6.5:38031.
2023-02-02 14:58:05,783 INFO worker.py:1352 -- Connecting to existing Ray cluster at address: 10.192.6.5:38031...

terminal 1

(freenome-ray-examples-py3.10) ➜  ray-examples git:(rpx-3397-add-sens-spec) ✗ python ray_examples/simple_task.py
(f pid=4037, ip=10.192.6.5) RemotePdb session open at 10.192.6.5:44835, use 'ray debug' to connect...
(f pid=4036, ip=10.192.6.5) RemotePdb session open at 10.192.6.5:38031, use 'ray debug' to connect...
Traceback (most recent call last):
  File "/home/zcarrico/ray-examples/ray_examples/simple_task.py", line 13, in <module>
    print(ray.get(futures))
  File "/home/zcarrico/.virtualenvs/freenome-ray-examples-QntpqrFU-py3.10/lib/python3.10/site-packages/ray/_private/client_mode_hook.py", line 104, in wrapper
    return getattr(ray, func.__name__)(*args, **kwargs)
  File "/home/zcarrico/.virtualenvs/freenome-ray-examples-QntpqrFU-py3.10/lib/python3.10/site-packages/ray/util/client/api.py", line 42, in get
    return self.worker.get(vals, timeout=timeout)
  File "/home/zcarrico/.virtualenvs/freenome-ray-examples-QntpqrFU-py3.10/lib/python3.10/site-packages/ray/util/client/worker.py", line 434, in get
    res = self._get(to_get, op_timeout)
  File "/home/zcarrico/.virtualenvs/freenome-ray-examples-QntpqrFU-py3.10/lib/python3.10/site-packages/ray/util/client/worker.py", line 462, in _get
    raise err
types.RayTaskError(UnicodeDecodeError): ray::f() (pid=4036, ip=10.192.6.5)
  File "/home/zcarrico/ray-examples/ray_examples/simple_task.py", line 9, in f
  File "/home/zcarrico/ray-examples/ray_examples/simple_task.py", line 9, in f
  File "/fn/lib/python3.10/bdb.py", line 90, in trace_dispatch
    return self.dispatch_line(frame)
  File "/fn/lib/python3.10/bdb.py", line 114, in dispatch_line
    self.user_line(frame)
  File "/fn/lib/python3.10/pdb.py", line 262, in user_line
    self.interaction(frame, None)
  File "/fn/lib/python3.10/pdb.py", line 357, in interaction
    self._cmdloop()
  File "/fn/lib/python3.10/pdb.py", line 322, in _cmdloop
    self.cmdloop()
  File "/fn/lib/python3.10/cmd.py", line 132, in cmdloop
    line = self.stdin.readline()
  File "/fn/lib/python3.10/codecs.py", line 322, in decode
    (result, consumed) = self._buffer_decode(data, self.errors, final)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xff in position 49: invalid start byte

Anyone have any questions, ideas what’s going on here, or something to try?

@sangcho @ClarenceNg for thoughts

@sangcho @ClarenceNg , following up on the above comment, did you have any thoughts about things to try to fix this? Ray’s debugging features were one of its more attractive features for us using it, so hopefully there’s a way to get this to work on k8s :slight_smile:

Are you using a cluster? Wonder if it is because you don’t use Ray Debugger — Ray 3.0.0.dev0?

If it is not the case, please create an issue!

Thank you @sangcho. Yes, using a cluster, and using

        rayStartParams:
            ray-debugger-external: "true"

in the CRD

What’s the address you are using? Do you use ray://ray-examples-head-svc:10001? Also is terminal 1 & 2 in the same machine?

Yes to both questions

1 Like