How severe does this issue affect your experience of using Ray?
- Medium: It contributes to significant difficulty to complete my task, but I can work around it.
Hello, I’m a new user of ray tune. I want to tune the hyperparameters of pretrained tensorflow model from huggingface but i 'm facing some problems with my code which i just follow the documentation can you please tell me what is wrong with it
Could you provide more details about your problem?
cc: @justinvyu this probably should go to AIR category.
I attach my code, i don’t understand the error message when i run the code cell
this is the error message
2023-02-24 15:24:32,174 ERROR ray_trial_executor.py:687 – Trial summarization_trainable_21b8f_00000: Unexpected error starting runner.
Traceback (most recent call last):
File “/usr/local/lib/python3.8/dist-packages/ray/tune/execution/ray_trial_executor.py”, line 680, in start_trial
File “/usr/local/lib/python3.8/dist-packages/ray/tune/execution/ray_trial_executor.py”, line 521, in _start_trial
runner = self._setup_remote_runner(trial)
File “/usr/local/lib/python3.8/dist-packages/ray/tune/execution/ray_trial_executor.py”, line 462, in _setup_remote_runner
File “/usr/local/lib/python3.8/dist-packages/ray/actor.py”, line 639, in remote
return actor_cls._remote(args=args, kwargs=kwargs, **updated_options)
File “/usr/local/lib/python3.8/dist-packages/ray/util/tracing/tracing_helper.py”, line 387, in _invocation_actor_class_remote_span
return method(self, args, kwargs, *_args, **_kwargs)
File “/usr/local/lib/python3.8/dist-packages/ray/actor.py”, line 846, in _remote
File “/usr/local/lib/python3.8/dist-packages/ray/_private/function_manager.py”, line 479, in export_actor_class
File “/usr/local/lib/python3.8/dist-packages/ray/_private/utils.py”, line 814, in check_oversized_function
ValueError: The actor ImplicitFunc is too large (244 MiB > FUNCTION_SIZE_ERROR_THRESHOLD=95 MiB). Check that its definition is not implicitly capturing a large array or other object in scope. Tip: use ray.put() to put large objects in the Ray object store.
2023-02-24 15:24:34,192 WARNING util.py:244 – The
start_trial operation took 7.946 s, which may be a performance bottleneck.
cc: @matthewdeng for thoughts
Hey @Nourhan_Abdelaziz, thanks for sharing the repro!
For this issue, some context is that for Tune, your trainable (i.e.
summarization_trainable) will be serialized and shipped to worker processes where it will be executed. A common cause for the error you are seeing is when global data is serialized as part of the trainable definition - taking a glance at your repro this includes at least
There are two common approaches you can use to solve this:
- [Recommended] Construct these global entities directly in the trainable. For example, you would want to move
tf_train_dataset, tf_eval_dataset, tf_test_dataset = prepare_datasets(model=model) into
summarization_trainable. Note: You would also need to move the creation of
- If you specifically want to create these objects a single time and share them across Trials, you can create them once, place them in the object store, and then pass them into the trainable as object references using
tune.with_parameters. As an example:
tf_train_dataset, tf_eval_dataset, tf_test_dataset = prepare_datasets(model=model)
def summarization_trainable(config: Dict, tf_train_dataset, tf_eval_dataset, tf_test_dataset) -> None:
trainable = tune.with_parameters(summarization_trainable, tf_train_dataset=tf_train_dataset, tf_eval_dataset=tf_eval_dataset, tf_test_dataset=tf_test_dataset)
tuner = Tuner(trainable, ...)