Our tune.run flow fails with the following error in 1.9.
DEBUG:ray.tune.registry:Detected class for trainable.
DEBUG:ray.worker:Automatically increasing RLIMIT_NOFILE to max value of 1048576
Entering tune.run
2021-12-04 13:16:46,364 ERROR trial_runner.py:958 -- Trial <trial_name>: Error processing event.
Traceback (most recent call last):
File "/home/ubuntu/Envs/<env>/lib/python3.7/site-packages/ray/tune/trial_runner.py", line 924, in _process_trial
results = self.trial_executor.fetch_result(trial)
File "/home/ubuntu/Envs/<env>/lib/python3.7/site-packages/ray/tune/ray_trial_executor.py", line 787, in fetch_result
result = ray.get(trial_future[0], timeout=DEFAULT_GET_TIMEOUT)
File "/home/ubuntu/Envs/<env>/lib/python3.7/site-packages/ray/_private/client_mode_hook.py", line 105, in wrapper
return func(*args, **kwargs)
File "/home/ubuntu/Envs/<env>/lib/python3.7/site-packages/ray/worker.py", line 1715, in get
raise value
ray.exceptions.RayActorError: The actor died because of an error raised in its creation task, ray::<trainable_name>.__init__() (pid=25273, ip=<ip address>)
RuntimeError: The actor with name <trainable_name> failed to import on the worker. This may be because needed library dependencies are not installed in the worker environment:
ray::<trainable_name>.__init__() (pid=25273, ip=172.31.27.245)
ModuleNotFoundError: No module named '<folder>.<script in which trainable resides>'
I tried adding a runtime_env
to ray.init with the working_dir
set, but I couldn’t get that to work because of an error that directed me to pip install ray[default]
first. Couldn’t get it to work even after running that.
I switched back to a 1.8 installation and it ran without issues. I didn’t have to specify any environments, etc. I wonder what changed?