How severe does this issue affect your experience of using Ray?
- High: It blocks me to complete my task.
I’m new to Ray. To try a sample code, I’m using Fine-tune a Hugging Face Transformers Model — Ray 2.40.0 step by step. Everything looks good until I get to trainer.fit()
, the error message I get is:
ModuleNotFoundError: No module named ‘torch’.
(I’ve added the full error at the of the message.)
I can simply import torch
, or import ray.train.torch.
The only problem is the trainer.fit()
.
additional information:
pytorch has been installed from root for all users, the path looks like this:
</opt/data/python/pytorch/venv/lib/python3.9/site-packages/>
version is: 2.5.1+cu124
ray version: 2.40.0
python version: 3.9
I’m running the codes in a jupyter notebook.
can you let me know what is the problem?
trainer.fit()
ModuleNotFoundError Traceback (most recent call last)
Cell In[37], line 1
----> 1 trainer.fit()File /opt/data/python/ray/venv/lib/python3.9/site-packages/ray/train/base_trainer.py:580, in BaseTrainer.fit(self)
577 from ray.tune import ResumeConfig, TuneError
578 from ray.tune.tuner import Tuner
→ 580 trainable = self.as_trainable()
581 param_space = self._extract_fields_for_tuner_param_space()
583 self.run_config.name = (
584 self.run_config.name or StorageContext.get_experiment_dir_name(trainable)
585 )File /opt/data/python/ray/venv/lib/python3.9/site-packages/ray/train/base_trainer.py:827, in BaseTrainer.as_trainable(self)
824 trainable_cls = self._generate_trainable_cls()
826 # Wrap withtune.with_parameters
to handle very large values in base_config
→ 827 return tune.with_parameters(trainable_cls, **base_config)File /opt/data/python/ray/venv/lib/python3.9/site-packages/ray/tune/trainable/util.py:107, in with_parameters(trainable, **kwargs)
105 prefix = f"{str(trainable)}_"
106 for k, v in kwargs.items():
→ 107 parameter_registry.put(prefix + k, v)
109 trainable_name = getattr(trainable, “name”, “tune_with_parameters”)
110 keys = set(kwargs.keys())File /opt/data/python/ray/venv/lib/python3.9/site-packages/ray/tune/registry.py:301, in _ParameterRegistry.put(self, k, v)
299 self.to_flush[k] = v
300 if ray.is_initialized():
→ 301 self.flush()File /opt/data/python/ray/venv/lib/python3.9/site-packages/ray/tune/registry.py:313, in _ParameterRegistry.flush(self)
311 self.references[k] = v
312 else:
→ 313 self.references[k] = ray.put(v)
314 self.to_flush.clear()File /opt/data/python/ray/venv/lib/python3.9/site-packages/ray/_private/auto_init_hook.py:21, in wrap_auto_init..auto_init_wrapper(*args, **kwargs)
18 @wraps(fn)
19 def auto_init_wrapper(*args, **kwargs):
20 auto_init_ray()
—> 21 return fn(*args, **kwargs)File /opt/data/python/ray/venv/lib/python3.9/site-packages/ray/_private/client_mode_hook.py:102, in client_mode_hook..wrapper(*args, **kwargs)
98 if client_mode_should_convert():
99 # Legacy code
100 # we only convert init function if RAY_CLIENT_MODE=1
101 if func.name != “init” or is_client_mode_enabled_by_default:
→ 102 return getattr(ray, func.name)(*args, **kwargs)
103 return func(*args, **kwargs)File /opt/data/python/ray/venv/lib/python3.9/site-packages/ray/util/client/api.py:52, in _ClientAPI.put(self, *args, **kwargs)
44 def put(self, *args, **kwargs):
45 “”“put is the hook stub passed on to replaceray.put
46
47 Args:
(…)
50 kwargs: opaque keyword arguments
51 “””
—> 52 return self.worker.put(*args, **kwargs)File /opt/data/python/ray/venv/lib/python3.9/site-packages/ray/util/client/worker.py:495, in Worker.put(self, val, client_ref_id, _owner)
487 raise TypeError(
488 "Calling ‘put’ on an ObjectRef is not allowed "
489 "(similarly, returning an ObjectRef from a remote "
(…)
492 “call ‘put’ on it (or return it).”
493 )
494 data = dumps_from_client(val, self._client_id)
→ 495 return self._put_pickled(data, client_ref_id, _owner)File /opt/data/python/ray/venv/lib/python3.9/site-packages/ray/util/client/worker.py:509, in Worker._put_pickled(self, data, client_ref_id, owner)
507 if not resp.valid:
508 try:
→ 509 raise cloudpickle.loads(resp.error)
510 except (pickle.UnpicklingError, TypeError):
511 logger.exception(“Failed to deserialize {}”.format(resp.error))ModuleNotFoundError: No module named ‘torch’