InactiveRpcError: Deadline Exceeded

The autoscaler failed with the following error:
Traceback (most recent call last):
  File "/home/ray/anaconda3/lib/python3.7/site-packages/ray/autoscaler/_private/monitor.py", line 423, in run
    self._run()
  File "/home/ray/anaconda3/lib/python3.7/site-packages/ray/autoscaler/_private/monitor.py", line 311, in _run
    self.update_load_metrics()
  File "/home/ray/anaconda3/lib/python3.7/site-packages/ray/autoscaler/_private/monitor.py", line 232, in update_load_metrics
    request, timeout=60)
  File "/home/ray/anaconda3/lib/python3.7/site-packages/grpc/_channel.py", line 923, in __call__
    return _end_unary_response_blocking(state, call, False, None)
  File "/home/ray/anaconda3/lib/python3.7/site-packages/grpc/_channel.py", line 826, in _end_unary_response_blocking
    raise _InactiveRpcError(state)
grpc._channel._InactiveRpcError: <_InactiveRpcError of RPC that terminated with:
        status = StatusCode.DEADLINE_EXCEEDED
        details = "Deadline Exceeded"
        debug_error_string = "{"created":"@1641674485.444519145","description":"Deadline Exceeded","file":"src/core/ext/filters/deadline/deadline_filter.cc","file_line":81,"grpc_status":4}"
>

I am working on a project where we call trainer.run and pass dataframe as one of the config parameters. The following error shows up after some random period of training. Which means everything seems to work, trains for 2-3 epochs and then returns the error. A note here is that for the data we use Modin dataframe, but the error also appears if we remove modin and use instead plain pandas dataframe instead. I am looking for suggestions of what can cause this type of error and how to possibly solve it?

Hi @Radi_Cho, does this not happen if you don’t pass a dataframe as one of the config parameters, with everything else held equal?