ERROR tune_controller.py:1502 -- Trial task failed for trial

How severe does this issue affect your experience of using Ray?

  • None: Just asking a question out of curiosity
  • Low: It annoys or frustrates me for a moment.
  • Medium: It contributes to significant difficulty to complete my task, but I can work around it.
  • High: It blocks me to complete my task.

Medium severity.

Hi

I am trying to train a NeuralForecast model with their AutoNHits library, which incorporates Ray for parameter tuning. It appears that AutoNHits was able to proceed with the training, but I received a slew of Ray related error message preceding the training. I already sent the issue to Nixtla, but I thought I might as well seek your advice too.

Many thanks,
Stefan

setup:
Ray 2.7
Neuralforest 1.6.3
python 3.9


Below please find the error message.

(_train_tune pid=31708) Global seed set to 1
2023-10-03 14:11:02,636 ERROR tune_controller.py:1502 – Trial task failed for trial _train_tune_c6464_00000
Traceback (most recent call last):
File “C:\Users\somas\anaconda3\envs\neuralforecast\lib\site-packages\ray\air\execution_internal\event_manager.py”, line 110, in resolve_future
result = ray.get(future)
File “C:\Users\somas\anaconda3\envs\neuralforecast\lib\site-packages\ray_private\auto_init_hook.py”, line 24, in auto_init_wrapper
return fn(*args, **kwargs)
File “C:\Users\somas\anaconda3\envs\neuralforecast\lib\site-packages\ray_private\client_mode_hook.py”, line 103, in wrapper
return func(*args, **kwargs)
File “C:\Users\somas\anaconda3\envs\neuralforecast\lib\site-packages\ray_private\worker.py”, line 2547, in get
raise value.as_instanceof_cause()
ray.exceptions.RayTaskError(FileNotFoundError): ray::ImplicitFunc.train() (pid=31708, ip=127.0.0.1, actor_id=d0cbc11ed1e6de30165943c201000000, repr=_train_tune)
File “python\ray_raylet.pyx”, line 1616, in ray._raylet.execute_task
File “python\ray_raylet.pyx”, line 1556, in ray._raylet.execute_task.function_executor
File “C:\Users\somas\anaconda3\envs\neuralforecast\lib\site-packages\ray_private\function_manager.py”, line 726, in actor_method_executor
return method(__ray_actor, *args, **kwargs)
File “C:\Users\somas\anaconda3\envs\neuralforecast\lib\site-packages\ray\util\tracing\tracing_helper.py”, line 467, in _resume_span
return method(self, *_args, **_kwargs)
File “C:\Users\somas\anaconda3\envs\neuralforecast\lib\site-packages\ray\tune\trainable\trainable.py”, line 400, in train
raise skipped from exception_cause(skipped)
File “C:\Users\somas\anaconda3\envs\neuralforecast\lib\site-packages\ray\air_internal\util.py”, line 91, in run
self._ret = self._target(*self._args, **self._kwargs)
File “C:\Users\somas\anaconda3\envs\neuralforecast\lib\site-packages\ray\tune\trainable\function_trainable.py”, line 383, in
training_func=lambda: self._trainable_func(self.config),
File “C:\Users\somas\anaconda3\envs\neuralforecast\lib\site-packages\ray\util\tracing\tracing_helper.py”, line 467, in _resume_span
return method(self, *_args, **_kwargs)
File “C:\Users\somas\anaconda3\envs\neuralforecast\lib\site-packages\ray\tune\trainable\function_trainable.py”, line 822, in _trainable_func
output = fn()
File “C:\Users\somas\anaconda3\envs\neuralforecast\lib\site-packages\ray\tune\trainable\util.py”, line 321, in inner
return trainable(config, **fn_kwargs)
File “C:\Users\somas\anaconda3\envs\neuralforecast\lib\site-packages\neuralforecast\common_base_auto.py”, line 145, in _train_tune
_ = self._fit_model(
File “C:\Users\somas\anaconda3\envs\neuralforecast\lib\site-packages\neuralforecast\common_base_auto.py”, line 202, in _fit_model
model.fit(dataset, val_size=val_size, test_size=test_size)
File “C:\Users\somas\anaconda3\envs\neuralforecast\lib\site-packages\neuralforecast\common_base_windows.py”, line 725, in fit
trainer.fit(self, datamodule=datamodule)
File “C:\Users\somas\anaconda3\envs\neuralforecast\lib\site-packages\pytorch_lightning\trainer\trainer.py”, line 532, in fit
call._call_and_handle_interrupt(
File “C:\Users\somas\anaconda3\envs\neuralforecast\lib\site-packages\pytorch_lightning\trainer\call.py”, line 43, in _call_and_handle_interrupt
return trainer_fn(*args, **kwargs)
File “C:\Users\somas\anaconda3\envs\neuralforecast\lib\site-packages\pytorch_lightning\trainer\trainer.py”, line 571, in _fit_impl
self._run(model, ckpt_path=ckpt_path)
File “C:\Users\somas\anaconda3\envs\neuralforecast\lib\site-packages\pytorch_lightning\trainer\trainer.py”, line 941, in _run
call._call_setup_hook(self) # allow user to setup lightning_module in accelerator environment
File “C:\Users\somas\anaconda3\envs\neuralforecast\lib\site-packages\pytorch_lightning\trainer\call.py”, line 79, in _call_setup_hook
if hasattr(logger, “experiment”):
File “C:\Users\somas\anaconda3\envs\neuralforecast\lib\site-packages\lightning_fabric\loggers\logger.py”, line 118, in experiment
return fn(self)
File “C:\Users\somas\anaconda3\envs\neuralforecast\lib\site-packages\lightning_fabric\loggers\tensorboard.py”, line 192, in experiment
self._experiment = SummaryWriter(log_dir=self.log_dir, **self._kwargs)
File “C:\Users\somas\anaconda3\envs\neuralforecast\lib\site-packages\tensorboardX\writer.py”, line 300, in init
self._get_file_writer()
File “C:\Users\somas\anaconda3\envs\neuralforecast\lib\site-packages\tensorboardX\writer.py”, line 348, in _get_file_writer
self.file_writer = FileWriter(logdir=self.logdir,
File “C:\Users\somas\anaconda3\envs\neuralforecast\lib\site-packages\tensorboardX\writer.py”, line 104, in init
self.event_writer = EventFileWriter(
File “C:\Users\somas\anaconda3\envs\neuralforecast\lib\site-packages\tensorboardX\event_file_writer.py”, line 106, in init
self._ev_writer = EventsWriter(os.path.join(
File “C:\Users\somas\anaconda3\envs\neuralforecast\lib\site-packages\tensorboardX\event_file_writer.py”, line 43, in init
self._py_recordio_writer = RecordWriter(self._file_name)
File “C:\Users\somas\anaconda3\envs\neuralforecast\lib\site-packages\tensorboardX\record_writer.py”, line 182, in init
self._writer = open_file(path)
File “C:\Users\somas\anaconda3\envs\neuralforecast\lib\site-packages\tensorboardX\record_writer.py”, line 61, in open_file
return open(path, ‘wb’)
FileNotFoundError: [Errno 2] No such file or directory: ‘C:\Users\somas\ray_results\_train_tune_2023-10-03_14-10-57\_train_tune_c6464_00000_0_batch_size=32,input_size=120,learning_rate=0.0010,max_steps=1000,mlp_units=512_512_512_512_512_512,n_blo_2023-10-03_14-10-57\lightning_logs\version_0\events.out.tfevents.1696367462.MSI’
2023-10-03 14:11:02,656 ERROR tune.py:1139 – Trials did not complete: [_train_tune_c6464_00000]
2023-10-03 14:11:02,656 INFO tune.py:1143 – Total run time: 5.51 seconds (5.48 seconds for the tuning loop).
Global seed set to 1

And ~/ray_results/_train_tune_2023-10-03_14-10-57/_train_tune_c6464_00000_0_batch_size=32,input_size=120,learning_rate=0.0010,max_steps=1000,mlp_units=512_512_512_512_512_512,n_blo_2023-10-03_14-10-57/error.txt

Failure # 1 (occurred at 2023-10-03_14-11-02)
^[[36mray::ImplicitFunc.train()^[[39m (pid=31708, ip=127.0.0.1, actor_id=d0cbc11ed1e6de30165943c201000000, repr=_train_tune)
File “python\ray_raylet.pyx”, line 1616, in ray._raylet.execute_task
File “python\ray_raylet.pyx”, line 1556, in ray._raylet.execute_task.function_executor
File “C:\Users\somas\anaconda3\envs\neuralforecast\lib\site-packages\ray_private\function_manager.py”, line 726, in actor_method_executor
return method(__ray_actor, *args, **kwargs)
File “C:\Users\somas\anaconda3\envs\neuralforecast\lib\site-packages\ray\util\tracing\tracing_helper.py”, line 467, in _resume_span
return method(self, *_args, **_kwargs)
File “C:\Users\somas\anaconda3\envs\neuralforecast\lib\site-packages\ray\tune\trainable\trainable.py”, line 400, in train
raise skipped from exception_cause(skipped)
File “C:\Users\somas\anaconda3\envs\neuralforecast\lib\site-packages\ray\air_internal\util.py”, line 91, in run
self._ret = self._target(*self._args, **self._kwargs)
File “C:\Users\somas\anaconda3\envs\neuralforecast\lib\site-packages\ray\tune\trainable\function_trainable.py”, line 383, in
training_func=lambda: self._trainable_func(self.config),
File “C:\Users\somas\anaconda3\envs\neuralforecast\lib\site-packages\ray\util\tracing\tracing_helper.py”, line 467, in _resume_span
return method(self, *_args, **_kwargs)
File “C:\Users\somas\anaconda3\envs\neuralforecast\lib\site-packages\ray\tune\trainable\function_trainable.py”, line 822, in _trainable_func
output = fn()
File “C:\Users\somas\anaconda3\envs\neuralforecast\lib\site-packages\ray\tune\trainable\util.py”, line 321, in inner
return trainable(config, **fn_kwargs)
File “C:\Users\somas\anaconda3\envs\neuralforecast\lib\site-packages\neuralforecast\common_base_auto.py”, line 145, in _train_tune
_ = self._fit_model(
File “C:\Users\somas\anaconda3\envs\neuralforecast\lib\site-packages\neuralforecast\common_base_auto.py”, line 202, in _fit_model
model.fit(dataset, val_size=val_size, test_size=test_size)
File “C:\Users\somas\anaconda3\envs\neuralforecast\lib\site-packages\neuralforecast\common_base_windows.py”, line 725, in fit
trainer.fit(self, datamodule=datamodule)
File “C:\Users\somas\anaconda3\envs\neuralforecast\lib\site-packages\pytorch_lightning\trainer\trainer.py”, line 532, in fit
call._call_and_handle_interrupt(
File “C:\Users\somas\anaconda3\envs\neuralforecast\lib\site-packages\pytorch_lightning\trainer\call.py”, line 43, in _call_and_handle_interrupt
return trainer_fn(*args, **kwargs)
File “C:\Users\somas\anaconda3\envs\neuralforecast\lib\site-packages\pytorch_lightning\trainer\trainer.py”, line 571, in _fit_impl
self._run(model, ckpt_path=ckpt_path)
File “C:\Users\somas\anaconda3\envs\neuralforecast\lib\site-packages\pytorch_lightning\trainer\trainer.py”, line 941, in _run
call._call_setup_hook(self) # allow user to setup lightning_module in accelerator environment
File “C:\Users\somas\anaconda3\envs\neuralforecast\lib\site-packages\pytorch_lightning\trainer\call.py”, line 79, in _call_setup_hook
if hasattr(logger, “experiment”):
File “C:\Users\somas\anaconda3\envs\neuralforecast\lib\site-packages\lightning_fabric\loggers\logger.py”, line 118, in experiment
return fn(self)
File “C:\Users\somas\anaconda3\envs\neuralforecast\lib\site-packages\lightning_fabric\loggers\tensorboard.py”, line 192, in experiment
self._experiment = SummaryWriter(log_dir=self.log_dir, **self._kwargs)
File “C:\Users\somas\anaconda3\envs\neuralforecast\lib\site-packages\tensorboardX\writer.py”, line 300, in init
self._get_file_writer()
File “C:\Users\somas\anaconda3\envs\neuralforecast\lib\site-packages\tensorboardX\writer.py”, line 348, in _get_file_writer
self.file_writer = FileWriter(logdir=self.logdir,
File “C:\Users\somas\anaconda3\envs\neuralforecast\lib\site-packages\tensorboardX\writer.py”, line 104, in init
self.event_writer = EventFileWriter(
File “C:\Users\somas\anaconda3\envs\neuralforecast\lib\site-packages\tensorboardX\event_file_writer.py”, line 106, in init
self._ev_writer = EventsWriter(os.path.join(
File “C:\Users\somas\anaconda3\envs\neuralforecast\lib\site-packages\tensorboardX\event_file_writer.py”, line 43, in init
self._py_recordio_writer = RecordWriter(self._file_name)
File “C:\Users\somas\anaconda3\envs\neuralforecast\lib\site-packages\tensorboardX\record_writer.py”, line 182, in init
self._writer = open_file(path)
File “C:\Users\somas\anaconda3\envs\neuralforecast\lib\site-packages\tensorboardX\record_writer.py”, line 61, in open_file
return open(path, ‘wb’)
FileNotFoundError: [Errno 2] No such file or directory: ‘C:\Users\somas\ray_results\_train_tune_2023-10-03_14-10-57\_train_tune_c6464_00000_0_batch_size=32,input_size=120,learning_rate=0.0010,max_steps=1000,mlp_units=512_512_512_512_512_512,n_blo_2023-10-03_14-10-57\lightning_logs\version_0\events.out.tfevents.1696367462.MSI’
~

The stacktrace seems to point that this is happening inside the trial, where the logic is controlled by neuralforecast/pytorch_lightning/tensorboardX:

File “C:\Users\somas\anaconda3\envs\neuralforecast\lib\site-packages\neuralforecast\common_base_auto.py”, line 145, in _train_tune
_ = self._fit_model(
File “C:\Users\somas\anaconda3\envs\neuralforecast\lib\site-packages\neuralforecast\common_base_auto.py”, line 202, in _fit_model
model.fit(dataset, val_size=val_size, test_size=test_size)
File “C:\Users\somas\anaconda3\envs\neuralforecast\lib\site-packages\neuralforecast\common_base_windows.py”, line 725, in fit
trainer.fit(self, datamodule=datamodule)
File “C:\Users\somas\anaconda3\envs\neuralforecast\lib\site-packages\pytorch_lightning\trainer\trainer.py”, line 532, in fit
call._call_and_handle_interrupt(
File “C:\Users\somas\anaconda3\envs\neuralforecast\lib\site-packages\pytorch_lightning\trainer\call.py”, line 43, in _call_and_handle_interrupt
return trainer_fn(*args, **kwargs)
File “C:\Users\somas\anaconda3\envs\neuralforecast\lib\site-packages\pytorch_lightning\trainer\trainer.py”, line 571, in _fit_impl
self._run(model, ckpt_path=ckpt_path)
File “C:\Users\somas\anaconda3\envs\neuralforecast\lib\site-packages\pytorch_lightning\trainer\trainer.py”, line 941, in _run
call._call_setup_hook(self) # allow user to setup lightning_module in accelerator environment
File “C:\Users\somas\anaconda3\envs\neuralforecast\lib\site-packages\pytorch_lightning\trainer\call.py”, line 79, in _call_setup_hook
if hasattr(logger, “experiment”):
File “C:\Users\somas\anaconda3\envs\neuralforecast\lib\site-packages\lightning_fabric\loggers\logger.py”, line 118, in experiment
return fn(self)
File “C:\Users\somas\anaconda3\envs\neuralforecast\lib\site-packages\lightning_fabric\loggers\tensorboard.py”, line 192, in experiment
self._experiment = SummaryWriter(log_dir=self.log_dir, **self._kwargs)
File “C:\Users\somas\anaconda3\envs\neuralforecast\lib\site-packages\tensorboardX\writer.py”, line 300, in init
self._get_file_writer()
File “C:\Users\somas\anaconda3\envs\neuralforecast\lib\site-packages\tensorboardX\writer.py”, line 348, in _get_file_writer
self.file_writer = FileWriter(logdir=self.logdir,
File “C:\Users\somas\anaconda3\envs\neuralforecast\lib\site-packages\tensorboardX\writer.py”, line 104, in init
self.event_writer = EventFileWriter(
File “C:\Users\somas\anaconda3\envs\neuralforecast\lib\site-packages\tensorboardX\event_file_writer.py”, line 106, in init
self._ev_writer = EventsWriter(os.path.join(
File “C:\Users\somas\anaconda3\envs\neuralforecast\lib\site-packages\tensorboardX\event_file_writer.py”, line 43, in init
self._py_recordio_writer = RecordWriter(self._file_name)
File “C:\Users\somas\anaconda3\envs\neuralforecast\lib\site-packages\tensorboardX\record_writer.py”, line 182, in init
self._writer = open_file(path)
File “C:\Users\somas\anaconda3\envs\neuralforecast\lib\site-packages\tensorboardX\record_writer.py”, line 61, in open_file
return open(path, ‘wb’)
FileNotFoundError: [Errno 2] No such file or directory: ‘C:\Users\somas\ray_results\_train_tune_2023-10-03_14-10-57\_train_tune_c6464_00000_0_batch_size=32,input_size=120,learning_rate=0.0010,max_steps=1000,mlp_units=512_512_512_512_512_512,n_blo_2023-10-03_14-10-57\lightning_logs\version_0\events.out.tfevents.1696367462.MSI’

Hi Matthew

Thanks for your reply. Glad that log made sense to you! :slight_smile:

The Neuralforecast team has received a few issues and they are aware of the situation. I guess we’ll just have to wait and see what happens next.

Best,
S