Worker times out while preparing for training

eg4l · June 29, 2021, 3:49pm

Function I provide to tune.run does some preparations before it launches actual training.
It happens that this preparation sometimes takes long enough for a timeout to occur with the following output:

(pid=303) 2021-06-29 08:34:50,334	ERROR trial_runner.py:748 -- Trial start_training_with_trainer_62d4f6f8: Error processing event.
(pid=303) Traceback (most recent call last):
(pid=303)   File "/home/ray/anaconda3/lib/python3.7/site-packages/ray/tune/trial_runner.py", line 718, in _process_trial
(pid=303)     results = self.trial_executor.fetch_result(trial)
(pid=303)   File "/home/ray/anaconda3/lib/python3.7/site-packages/ray/tune/ray_trial_executor.py", line 688, in fetch_result
(pid=303)     result = ray.get(trial_future[0], timeout=DEFAULT_GET_TIMEOUT)
(pid=303)   File "/home/ray/anaconda3/lib/python3.7/site-packages/ray/_private/client_mode_hook.py", line 62, in wrapper
(pid=303)     return func(*args, **kwargs)
(pid=303)   File "/home/ray/anaconda3/lib/python3.7/site-packages/ray/worker.py", line 1496, in get
(pid=303)     raise value
(pid=303) ray.exceptions.RayActorError: The actor died unexpectedly before finishing this task.

Right now I don’t think I can avoid this preparation that takes sometimes above 60 seconds.
Is there some easy way to avoid this that I am missing?

Topic		Replies	Views
Ray tune trials fail due to unexpected worker exit Ray Train	1	305	April 1, 2024
Running trial is not in ready list from ray.wait() Ray Core	1	258	January 6, 2022
Tune: ray logs for failed tune trial	5	750	March 27, 2023
Ray Tune training hangs Ray Tune	3	1261	October 18, 2023
Possible to set timeout for individual trials in tune.run? Ray Tune	1	451	January 6, 2021

Worker times out while preparing for training

Related topics