Error when trying to schedule some processes on a ray cluster

I have a ray cluster and I’m trying to connect from an external pod on K8s. I use
ray.util.connect
and I get this trace error with something about base64 encoding

Traceback (most recent call last):
  File "/opt/conda/envs/nlp376/lib/python3.7/site-packages/ray/util/client/worker.py", line 300, in _call_schedule_for_task
    ticket = self.server.Schedule(task, metadata=self.metadata)
  File "/opt/conda/envs/nlp376/lib/python3.7/site-packages/grpc/_channel.py", line 923, in __call__
    return _end_unary_response_blocking(state, call, False, None)
  File "/opt/conda/envs/nlp376/lib/python3.7/site-packages/grpc/_channel.py", line 826, in _end_unary_response_blocking
    raise _InactiveRpcError(state)
grpc._channel._InactiveRpcError: <_InactiveRpcError of RPC that terminated with:
	status = StatusCode.UNAVAILABLE
	details = "Connection reset by peer"
	debug_error_string = "{"created":"@1623683910.467183534","description":"Error received from peer ipv4:10.0.3.87:10001","file":"src/core/lib/surface/call.cc","file_line":1062,"grpc_message":"Connection reset by peer","grpc_status":14}"
>
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
  File "server_second.py", line 150, in <module>
    main()
  File "server_second.py", line 111, in main
    encoders={EncoderType.word_to_vec: enc}
  File "/opt/conda/envs/nlp376/lib/python3.7/site-packages/higgins/server/runners/parallel_inference_runner.py", line 51, in __init__
    m = RmMClass.remote(**params)
  File "/opt/conda/envs/nlp376/lib/python3.7/site-packages/ray/actor.py", line 413, in remote
    return self._remote(args=args, kwargs=kwargs)
  File "/opt/conda/envs/nlp376/lib/python3.7/site-packages/ray/actor.py", line 587, in _remote
    override_environment_variables))
  File "/opt/conda/envs/nlp376/lib/python3.7/site-packages/ray/_private/client_mode_hook.py", line 90, in client_mode_convert_actor
    return client_actor._remote(in_args, in_kwargs, **kwargs)
  File "/opt/conda/envs/nlp376/lib/python3.7/site-packages/ray/util/client/common.py", line 183, in _remote
    return self.options(**option_args).remote(*args, **kwargs)
  File "/opt/conda/envs/nlp376/lib/python3.7/site-packages/ray/util/client/common.py", line 296, in remote
    ref_ids = ray.call_remote(self, *args, **kwargs)
  File "/opt/conda/envs/nlp376/lib/python3.7/site-packages/ray/util/client/api.py", line 96, in call_remote
    return self.worker.call_remote(instance, *args, **kwargs)
  File "/opt/conda/envs/nlp376/lib/python3.7/site-packages/ray/util/client/worker.py", line 293, in call_remote
    return self._call_schedule_for_task(task)
  File "/opt/conda/envs/nlp376/lib/python3.7/site-packages/ray/util/client/worker.py", line 302, in _call_schedule_for_task
    raise decode_exception(e.details())
  File "/opt/conda/envs/nlp376/lib/python3.7/site-packages/ray/util/client/worker.py", line 482, in decode_exception
    data = base64.standard_b64decode(data)
  File "/opt/conda/envs/nlp376/lib/python3.7/base64.py", line 105, in standard_b64decode
    return b64decode(s)
  File "/opt/conda/envs/nlp376/lib/python3.7/base64.py", line 87, in b64decode
    return binascii.a2b_base64(s)
binascii.Error: Invalid base64-encoded string: number of data characters (21) cannot be 1 more than a multiple of 4

Any idea?

How odd – @ijrsvt any idea about this? Or @Dmitri , if it’s a K8s-related issue

Hi,
I think I found “kinda” the problem, but maybe you should tell me. I had an image with ray which I use as base for Head and Worker nodes, and I had it based on python 3.7.6, but checking your official images I saw that the 3.7 image is based on 3.7.7. Maybe is there a bug with gRPC and py 3.7.6? At the end I downgraded all to python 3.6 because that’s what the data scientists are using, but I’m wondering if is it an issue with the python version