Hi, I’m a new Ray user trying to use Ray Train and Ray Tune to wrap my U-Net training. I’ve successfully replicated the tutorials for Ray Train, and I was able to write a full trainable function and call trainer.fit() to run a few epochs successfully.
However, when trying to use Ray Tune after defining the hyperparameter search space and calling tuner.fit() on the trainable, I get an error. I initially did not have the config as an input/argument of the trainable function, and I got an error “ValueError: Unknown argument found in the Trainable function. The function args must include a ‘config’ positional parameter. Any other args must be ‘checkpoint_dir’. Found: ”.
So, I included the config as an input/argument of the trainable function but ended up with another error “raise SessionMisuseError(ray.train.error.SessionMisuseError: prepare/accelerate utility functions should be called inside a training function executed by `Trainer.run’”.
I’ve searched online for both of these errors but I couldn’t find an applicable solution to mine. I would appreciate any help/input because it seems to be the case that I am stuck. Below is the full error traceback in both of my cases, and I can provide the relevant codes if needed. Thank you in advance!
Error traceback with no “config” parameter in trainable:
Traceback (most recent call last):
File "C:\Users\Kevin\PycharmProjects\hubmap\models\unet++\tune.py", line 482, in <module>
results = tuner.fit()
File "C:\Users\Kevin\.conda\envs\hubmap\lib\site-packages\ray\tune\tuner.py", line 347, in fit
return self._local_tuner.fit()
File "C:\Users\Kevin\.conda\envs\hubmap\lib\site-packages\ray\tune\impl\tuner_internal.py", line 588, in fit
analysis = self._fit_internal(trainable, param_space)
File "C:\Users\Kevin\.conda\envs\hubmap\lib\site-packages\ray\tune\impl\tuner_internal.py", line 703, in _fit_internal
analysis = run(
File "C:\Users\Kevin\.conda\envs\hubmap\lib\site-packages\ray\tune\tune.py", line 857, in run
experiments[i] = Experiment(
File "C:\Users\Kevin\.conda\envs\hubmap\lib\site-packages\ray\tune\experiment\experiment.py", line 219, in __init__
self._run_identifier = Experiment.register_if_needed(run)
File "C:\Users\Kevin\.conda\envs\hubmap\lib\site-packages\ray\tune\experiment\experiment.py", line 411, in register_if_needed
register_trainable(name, run_object)
File "C:\Users\Kevin\.conda\envs\hubmap\lib\site-packages\ray\tune\registry.py", line 105, in register_trainable
trainable = wrap_function(trainable, warn=warn)
File "C:\Users\Kevin\.conda\envs\hubmap\lib\site-packages\ray\tune\trainable\function_trainable.py", line 601, in wrap_function
raise ValueError(
ValueError: Unknown argument found in the Trainable function. The function args must include a 'config' positional parameter. Any other args must be 'checkpoint_dir'. Found: []
Error traceback with “config” parameter in trainable:
Traceback (most recent call last):
File “C:\Users\Kevin.conda\envs\hubmap\lib\site-packages\ray\air\execution_internal\event_manager.py”, line 110, in resolve_future
result = ray.get(future)
File “C:\Users\Kevin.conda\envs\hubmap\lib\site-packages\ray_private\auto_init_hook.py”, line 24, in auto_init_wrapper
return fn(*args, **kwargs)
File “C:\Users\Kevin.conda\envs\hubmap\lib\site-packages\ray_private\client_mode_hook.py”, line 103, in wrapper
return func(*args, **kwargs)
File “C:\Users\Kevin.conda\envs\hubmap\lib\site-packages\ray_private\worker.py”, line 2493, in get
raise value.as_instanceof_cause()
ray.exceptions.RayTaskError(SessionMisuseError): ray::ImplicitFunc.train() (pid=35208, ip=127.0.0.1, actor_id=17c521630b43f15718ffe17e01000000, repr=run_training)
File “python\ray_raylet.pyx”, line 1424, in ray._raylet.execute_task
File “python\ray_raylet.pyx”, line 1364, in ray._raylet.execute_task.function_executor
File “C:\Users\Kevin.conda\envs\hubmap\lib\site-packages\ray_private\function_manager.py”, line 726, in actor_method_executor
return method(__ray_actor, *args, **kwargs)
File “C:\Users\Kevin.conda\envs\hubmap\lib\site-packages\ray\util\tracing\tracing_helper.py”, line 464, in _resume_span
return method(self, *_args, **_kwargs)
File “C:\Users\Kevin.conda\envs\hubmap\lib\site-packages\ray\tune\trainable\trainable.py”, line 375, in train
raise skipped from exception_cause(skipped)
File “C:\Users\Kevin.conda\envs\hubmap\lib\site-packages\ray\tune\trainable\function_trainable.py”, line 349, in entrypoint
return self._trainable_func(
File “C:\Users\Kevin.conda\envs\hubmap\lib\site-packages\ray\util\tracing\tracing_helper.py”, line 464, in _resume_span
return method(self, *_args, **_kwargs)
File “C:\Users\Kevin.conda\envs\hubmap\lib\site-packages\ray\tune\trainable\function_trainable.py”, line 666, in _trainable_func
output = fn()
File “C:\Users\Kevin\PycharmProjects\hubmap\models\unet++\tune.py”, line 402, in run_training
model = train.torch.prepare_model(model)
File “C:\Users\Kevin.conda\envs\hubmap\lib\site-packages\ray\train\torch\train_loop_utils.py”, line 107, in prepare_model
return get_accelerator(_TorchAccelerator).prepare_model(
File “C:\Users\Kevin.conda\envs\hubmap\lib\site-packages\ray\train_internal\session.py”, line 491, in get_accelerator
_raise_accelerator_session_misuse()
File “C:\Users\Kevin.conda\envs\hubmap\lib\site-packages\ray\train_internal\session.py”, line 474, in _raise_accelerator_session_misuse
raise SessionMisuseError(
ray.train.error.SessionMisuseError: prepare/accelerate utility functions should be called inside a training function executed byTrainer.run
2023-08-11 19:37:42,309 WARNING tune.py:1122 – Trial Runner checkpointing failed: Sync process failed: GetFileInfo() yielded path ‘C:/Users/Kevin/ray_results/tune_trial_1/basic-variant-state-2023-08-11_16-31-58.json’, which is outside base dir ‘C:\Users\Kevin\ray_results\tune_trial_1’
Trial status: 72 ERROR
Current time: 2023-08-11 19:37:42. Total running time: 1min 3s
Logical resource usage: 24.0/24 CPUs, 0/1 GPUs