Error: grpc._channel._InactiveRpcError: <_InactiveRpcError of RP

How severe does this issue affect your experience of using Ray?

  • None: Just asking a question out of curiosity
  • Low: It annoys or frustrates me for a moment.
  • Medium: It contributes to significant difficulty to complete my task, but I can work around it.
  • High: It blocks me to complete my task.

I would like to know if it is possible to adjust the grpc resource?
Regarding error prompts:

Traceback (most recent call last):
File “/root/miniconda3/envs/pytorch1/lib/python3.8/site-packages/ray/tune/tuner.py”, line 234, in fit
return self._local_tuner.fit()
File “/root/miniconda3/envs/pytorch1/lib/python3.8/site-packages/ray/tune/impl/tuner_internal.py”, line 283, in fit
analysis = self._fit_internal(trainable, param_space)
File “/root/miniconda3/envs/pytorch1/lib/python3.8/site-packages/ray/tune/impl/tuner_internal.py”, line 380, in _fit_internal
analysis = run(
File “/root/miniconda3/envs/pytorch1/lib/python3.8/site-packages/ray/tune/tune.py”, line 520, in run
experiments[i] = Experiment(
File “/root/miniconda3/envs/pytorch1/lib/python3.8/site-packages/ray/tune/experiment/experiment.py”, line 166, in init
raise TuneError(
ray.tune.error.TuneError: The Trainable/training function is too large for grpc resource limit. Check that its definition is not implicitly capturing a large array or other object in scope. Tip: use tune.with_parameters() to put large objects in the Ray object store.
Original exception: Traceback (most recent call last):
File “/root/miniconda3/envs/pytorch1/lib/python3.8/site-packages/ray/tune/experiment/experiment.py”, line 163, in init
self._run_identifier = Experiment.register_if_needed(run)
File “/root/miniconda3/envs/pytorch1/lib/python3.8/site-packages/ray/tune/experiment/experiment.py”, line 356, in register_if_needed
register_trainable(name, run_object)
File “/root/miniconda3/envs/pytorch1/lib/python3.8/site-packages/ray/tune/registry.py”, line 101, in register_trainable
_global_registry.register(TRAINABLE_CLASS, name, trainable)
File “/root/miniconda3/envs/pytorch1/lib/python3.8/site-packages/ray/tune/registry.py”, line 189, in register
self.flush_values()
File “/root/miniconda3/envs/pytorch1/lib/python3.8/site-packages/ray/tune/registry.py”, line 211, in flush_values
_internal_kv_put(
File “/root/miniconda3/envs/pytorch1/lib/python3.8/site-packages/ray/_private/client_mode_hook.py”, line 105, in wrapper
return func(*args, **kwargs)
File “/root/miniconda3/envs/pytorch1/lib/python3.8/site-packages/ray/experimental/internal_kv.py”, line 94, in _internal_kv_put
return global_gcs_client.internal_kv_put(key, value, overwrite, namespace) == 0
File “/root/miniconda3/envs/pytorch1/lib/python3.8/site-packages/ray/_private/gcs_utils.py”, line 178, in wrapper
return f(self, *args, **kwargs)
File “/root/miniconda3/envs/pytorch1/lib/python3.8/site-packages/ray/_private/gcs_utils.py”, line 297, in internal_kv_put
reply = self._kv_stub.InternalKVPut(req, timeout=timeout)
File “/root/miniconda3/envs/pytorch1/lib/python3.8/site-packages/grpc/_channel.py”, line 946, in call
return _end_unary_response_blocking(state, call, False, None)
File “/root/miniconda3/envs/pytorch1/lib/python3.8/site-packages/grpc/_channel.py”, line 849, in _end_unary_response_blocking
raise _InactiveRpcError(state)
grpc._channel._InactiveRpcError: <_InactiveRpcError of RPC that terminated with:
status = StatusCode.RESOURCE_EXHAUSTED
details = “Received message larger than max (725885853 vs. 262144000)”
debug_error_string = “{“created”:”@1681056490.782774738",“description”:“Error received from peer ipv4:172.17.0.5:51483”,“file”:“src/core/lib/surface/call.cc”,“file_line”:1074,“grpc_message”:“Received message larger than max (725885853 vs. 262144000)”,“grpc_status”:8}"

I modified the " _ MAX_ MESSAGE_ LENGTH‘’ in gcs_utils.py , but it’s useless.

This can usually be fixed by changing the way you implement the training function. Could you share how you do that now?

Yes, after trying to adjust the code, I no longer see this error prompt.