Error: grpc._channel._InactiveRpcError: <_InactiveRpcError of RP

AnnnxXXx · April 9, 2023, 4:16pm

How severe does this issue affect your experience of using Ray?

None: Just asking a question out of curiosity
Low: It annoys or frustrates me for a moment.
Medium: It contributes to significant difficulty to complete my task, but I can work around it.
High: It blocks me to complete my task.

I would like to know if it is possible to adjust the grpc resource?
Regarding error prompts:

Traceback (most recent call last):
File “/root/miniconda3/envs/pytorch1/lib/python3.8/site-packages/ray/tune/tuner.py”, line 234, in fit
return self._local_tuner.fit()
File “/root/miniconda3/envs/pytorch1/lib/python3.8/site-packages/ray/tune/impl/tuner_internal.py”, line 283, in fit
analysis = self._fit_internal(trainable, param_space)
File “/root/miniconda3/envs/pytorch1/lib/python3.8/site-packages/ray/tune/impl/tuner_internal.py”, line 380, in _fit_internal
analysis = run(
File “/root/miniconda3/envs/pytorch1/lib/python3.8/site-packages/ray/tune/tune.py”, line 520, in run
experiments[i] = Experiment(
File “/root/miniconda3/envs/pytorch1/lib/python3.8/site-packages/ray/tune/experiment/experiment.py”, line 166, in init
raise TuneError(
ray.tune.error.TuneError: The Trainable/training function is too large for grpc resource limit. Check that its definition is not implicitly capturing a large array or other object in scope. Tip: use tune.with_parameters() to put large objects in the Ray object store.
Original exception: Traceback (most recent call last):
File “/root/miniconda3/envs/pytorch1/lib/python3.8/site-packages/ray/tune/experiment/experiment.py”, line 163, in init
self._run_identifier = Experiment.register_if_needed(run)
File “/root/miniconda3/envs/pytorch1/lib/python3.8/site-packages/ray/tune/experiment/experiment.py”, line 356, in register_if_needed
register_trainable(name, run_object)
File “/root/miniconda3/envs/pytorch1/lib/python3.8/site-packages/ray/tune/registry.py”, line 101, in register_trainable
_global_registry.register(TRAINABLE_CLASS, name, trainable)
File “/root/miniconda3/envs/pytorch1/lib/python3.8/site-packages/ray/tune/registry.py”, line 189, in register
self.flush_values()
File “/root/miniconda3/envs/pytorch1/lib/python3.8/site-packages/ray/tune/registry.py”, line 211, in flush_values
_internal_kv_put(
File “/root/miniconda3/envs/pytorch1/lib/python3.8/site-packages/ray/_private/client_mode_hook.py”, line 105, in wrapper
return func(*args, **kwargs)
File “/root/miniconda3/envs/pytorch1/lib/python3.8/site-packages/ray/experimental/internal_kv.py”, line 94, in _internal_kv_put
return global_gcs_client.internal_kv_put(key, value, overwrite, namespace) == 0
File “/root/miniconda3/envs/pytorch1/lib/python3.8/site-packages/ray/_private/gcs_utils.py”, line 178, in wrapper
return f(self, *args, **kwargs)
File “/root/miniconda3/envs/pytorch1/lib/python3.8/site-packages/ray/_private/gcs_utils.py”, line 297, in internal_kv_put
reply = self._kv_stub.InternalKVPut(req, timeout=timeout)
File “/root/miniconda3/envs/pytorch1/lib/python3.8/site-packages/grpc/_channel.py”, line 946, in call
return _end_unary_response_blocking(state, call, False, None)
File “/root/miniconda3/envs/pytorch1/lib/python3.8/site-packages/grpc/_channel.py”, line 849, in _end_unary_response_blocking
raise _InactiveRpcError(state)
grpc._channel._InactiveRpcError: <_InactiveRpcError of RPC that terminated with:
status = StatusCode.RESOURCE_EXHAUSTED
details = “Received message larger than max (725885853 vs. 262144000)”
debug_error_string = “{“created”:”@1681056490.782774738",“description”:“Error received from peer ipv4:172.17.0.5:51483”,“file”:“src/core/lib/surface/call.cc”,“file_line”:1074,“grpc_message”:“Received message larger than max (725885853 vs. 262144000)”,“grpc_status”:8}"

AnnnxXXx · April 9, 2023, 4:37pm

I modified the " _ MAX_ MESSAGE_ LENGTH‘’ in gcs_utils.py , but it’s useless.

xwjiang2010 · April 10, 2023, 4:06pm

This can usually be fixed by changing the way you implement the training function. Could you share how you do that now?

AnnnxXXx · April 10, 2023, 4:45pm

Yes, after trying to adjust the code, I no longer see this error prompt.

Topic		Replies	Views
Resuming training from big models in ray train leads to `grcp` error Ray Train	2	689	September 28, 2022
Using Tune with tensorflow encountered InactiveRpcError	5	647	December 29, 2022
Nightly build ray crashes after few training iterations using RLLib Ray Core	2	404	February 11, 2022
StatusCode.RESOURCE_EXHAUSTED Ray Tune	21	5113	April 11, 2023
Error in Colab: ImplicitFunc is very large and grpc_status”:8 Ray Tune	1	873	February 8, 2022

Error: grpc._channel._InactiveRpcError: <_InactiveRpcError of RP

Related topics