Function I provide to tune.run does some preparations before it launches actual training.
It happens that this preparation sometimes takes long enough for a timeout to occur with the following output:
(pid=303) 2021-06-29 08:34:50,334 ERROR trial_runner.py:748 -- Trial start_training_with_trainer_62d4f6f8: Error processing event.
(pid=303) Traceback (most recent call last):
(pid=303) File "/home/ray/anaconda3/lib/python3.7/site-packages/ray/tune/trial_runner.py", line 718, in _process_trial
(pid=303) results = self.trial_executor.fetch_result(trial)
(pid=303) File "/home/ray/anaconda3/lib/python3.7/site-packages/ray/tune/ray_trial_executor.py", line 688, in fetch_result
(pid=303) result = ray.get(trial_future[0], timeout=DEFAULT_GET_TIMEOUT)
(pid=303) File "/home/ray/anaconda3/lib/python3.7/site-packages/ray/_private/client_mode_hook.py", line 62, in wrapper
(pid=303) return func(*args, **kwargs)
(pid=303) File "/home/ray/anaconda3/lib/python3.7/site-packages/ray/worker.py", line 1496, in get
(pid=303) raise value
(pid=303) ray.exceptions.RayActorError: The actor died unexpectedly before finishing this task.
Right now I don’t think I can avoid this preparation that takes sometimes above 60 seconds.
Is there some easy way to avoid this that I am missing?