Hey everyone, I’m really new to machine learning and python so I don’t know much.
I set up a Pipeline with sklearn, and when I perform the grid search with GridSearchCV from sklearn, it works fine. But when I try to use TuneGridSearchCV without changing anything else, it always stops searching at the first error that comes up (in this case it is a ValueError from the PCA components parameter). If I’m not wrong, it is normal for all all these errors to come up, but the GridSearchCV ignores the parameters for this trial and moves on with the next (while TuneGridSearchCV stops the script and displays the first error that comes up).
The only idea I’ve had, was setting the error_score parameter to zero, but again it stops at the first error. That’s the whole warning/error output I get:
/opt/conda/lib/python3.7/site-packages/ray/tune/tune.py:369: UserWarning: The `loggers` argument is deprecated. Please pass the respective `LoggerCallback` classes to the `callbacks` argument instead. See https://docs.ray.io/en/latest/tune/api_docs/logging.html
"The `loggers` argument is deprecated. Please pass the respective "
(pid=174) /opt/conda/lib/python3.7/site-packages/sklearn/pipeline.py:355: UserWarning: Persisting input arguments took 0.61s to run.
(pid=174) If this happens often in your code, it can cause performance problems
(pid=174) (results will be correct in all cases).
(pid=174) The reason for this is probably some large input arguments for a wrapped
(pid=174) function (e.g. large strings).
(pid=174) THIS IS A JOBLIB ISSUE. If you can, kindly provide the joblib's team with an
(pid=174) example so that they can fix the problem.
(pid=174) **fit_params_steps[name],
(pid=173) /opt/conda/lib/python3.7/site-packages/sklearn/pipeline.py:355: UserWarning: Persisting input arguments took 0.60s to run.
(pid=173) If this happens often in your code, it can cause performance problems
(pid=173) (results will be correct in all cases).
(pid=173) The reason for this is probably some large input arguments for a wrapped
(pid=173) function (e.g. large strings).
(pid=173) THIS IS A JOBLIB ISSUE. If you can, kindly provide the joblib's team with an
(pid=173) example so that they can fix the problem.
(pid=173) **fit_params_steps[name],
(pid=175) /opt/conda/lib/python3.7/site-packages/sklearn/pipeline.py:355: UserWarning: Persisting input arguments took 0.62s to run.
(pid=175) If this happens often in your code, it can cause performance problems
(pid=175) (results will be correct in all cases).
(pid=175) The reason for this is probably some large input arguments for a wrapped
(pid=175) function (e.g. large strings).
(pid=175) THIS IS A JOBLIB ISSUE. If you can, kindly provide the joblib's team with an
(pid=175) example so that they can fix the problem.
(pid=175) **fit_params_steps[name],
(pid=172) /opt/conda/lib/python3.7/site-packages/sklearn/pipeline.py:355: UserWarning: Persisting input arguments took 0.64s to run.
(pid=172) If this happens often in your code, it can cause performance problems
(pid=172) (results will be correct in all cases).
(pid=172) The reason for this is probably some large input arguments for a wrapped
(pid=172) function (e.g. large strings).
(pid=172) THIS IS A JOBLIB ISSUE. If you can, kindly provide the joblib's team with an
(pid=172) example so that they can fix the problem.
(pid=172) **fit_params_steps[name],
(pid=174) /opt/conda/lib/python3.7/site-packages/sklearn/pipeline.py:355: UserWarning: Persisting input arguments took 0.60s to run.
(pid=174) If this happens often in your code, it can cause performance problems
(pid=174) (results will be correct in all cases).
(pid=174) The reason for this is probably some large input arguments for a wrapped
(pid=174) function (e.g. large strings).
(pid=174) THIS IS A JOBLIB ISSUE. If you can, kindly provide the joblib's team with an
(pid=174) example so that they can fix the problem.
(pid=174) **fit_params_steps[name],
(pid=173) /opt/conda/lib/python3.7/site-packages/sklearn/pipeline.py:355: UserWarning: Persisting input arguments took 0.60s to run.
(pid=173) If this happens often in your code, it can cause performance problems
(pid=173) (results will be correct in all cases).
(pid=173) The reason for this is probably some large input arguments for a wrapped
(pid=173) function (e.g. large strings).
(pid=173) THIS IS A JOBLIB ISSUE. If you can, kindly provide the joblib's team with an
(pid=173) example so that they can fix the problem.
(pid=173) **fit_params_steps[name],
(pid=175) /opt/conda/lib/python3.7/site-packages/sklearn/pipeline.py:355: UserWarning: Persisting input arguments took 0.61s to run.
(pid=175) If this happens often in your code, it can cause performance problems
(pid=175) (results will be correct in all cases).
(pid=175) The reason for this is probably some large input arguments for a wrapped
(pid=175) function (e.g. large strings).
(pid=175) THIS IS A JOBLIB ISSUE. If you can, kindly provide the joblib's team with an
(pid=175) example so that they can fix the problem.
(pid=175) **fit_params_steps[name],
(pid=172) /opt/conda/lib/python3.7/site-packages/sklearn/pipeline.py:355: UserWarning: Persisting input arguments took 0.62s to run.
(pid=172) If this happens often in your code, it can cause performance problems
(pid=172) (results will be correct in all cases).
(pid=172) The reason for this is probably some large input arguments for a wrapped
(pid=172) function (e.g. large strings).
(pid=172) THIS IS A JOBLIB ISSUE. If you can, kindly provide the joblib's team with an
(pid=172) example so that they can fix the problem.
(pid=172) **fit_params_steps[name],
---------------------------------------------------------------------------
RayTaskError(ValueError) Traceback (most recent call last)
/tmp/ipykernel_36/325189685.py in <module>
20
21 start = time.time()
---> 22 tune_search.fit(x_train, y_train)
23 end = time.time()
24 print(f'TuneGridSearchCV total fitting time: {round(end - start, 3)} sec')
/opt/conda/lib/python3.7/site-packages/tune_sklearn/tune_basesearch.py in fit(self, X, y, groups, tune_params, **fit_params)
663 ray_kwargs["local_mode"] = True
664 with ray_context(**ray_kwargs):
--> 665 return self._fit(X, y, groups, tune_params, **fit_params)
666
667 def score(self, X, y=None):
/opt/conda/lib/python3.7/site-packages/tune_sklearn/tune_basesearch.py in _fit(self, X, y, groups, tune_params, **fit_params)
568
569 self._fill_config_hyperparam(config)
--> 570 analysis = self._tune_run(config, resources_per_trial, tune_params)
571
572 self.cv_results_ = self._format_results(self.n_splits, analysis)
/opt/conda/lib/python3.7/site-packages/tune_sklearn/tune_gridsearch.py in _tune_run(self, config, resources_per_trial, tune_params)
288 "ignore", message="fail_fast='raise' "
289 "detected.")
--> 290 analysis = tune.run(trainable, **run_args)
291 return analysis
/opt/conda/lib/python3.7/site-packages/ray/tune/tune.py in run(run_or_experiment, name, metric, mode, stop, time_budget_s, config, resources_per_trial, num_samples, local_dir, search_alg, scheduler, keep_checkpoints_num, checkpoint_score_attr, checkpoint_freq, checkpoint_at_end, verbose, progress_reporter, log_to_file, trial_name_creator, trial_dirname_creator, sync_config, export_formats, max_failures, fail_fast, restore, server_port, resume, queue_trials, reuse_actors, trial_executor, raise_on_failed_trial, callbacks, max_concurrent_trials, loggers, _remote)
599 progress_reporter.set_start_time(tune_start)
600 while not runner.is_finished() and not state[signal.SIGINT]:
--> 601 runner.step()
602 if has_verbosity(Verbosity.V1_EXPERIMENT):
603 _report_progress(runner, progress_reporter)
/opt/conda/lib/python3.7/site-packages/ray/tune/trial_runner.py in step(self)
703 if self.trial_executor.in_staging_grace_period():
704 timeout = 0.1
--> 705 self._process_events(timeout=timeout)
706 else:
707 self._run_and_catch(self.trial_executor.on_no_available_trials)
/opt/conda/lib/python3.7/site-packages/ray/tune/trial_runner.py in _process_events(self, timeout)
861 else:
862 with warn_if_slow("process_trial"):
--> 863 self._process_trial(trial)
864
865 # `self._queued_trial_decisions` now contains a final decision
/opt/conda/lib/python3.7/site-packages/ray/tune/trial_runner.py in _process_trial(self, trial)
888 """
889 try:
--> 890 results = self.trial_executor.fetch_result(trial)
891 with warn_if_slow(
892 "process_trial_results",
/opt/conda/lib/python3.7/site-packages/ray/tune/ray_trial_executor.py in fetch_result(self, trial)
786 self._running.pop(trial_future[0])
787 with warn_if_slow("fetch_result"):
--> 788 result = ray.get(trial_future[0], timeout=DEFAULT_GET_TIMEOUT)
789
790 # For local mode
/opt/conda/lib/python3.7/site-packages/ray/_private/client_mode_hook.py in wrapper(*args, **kwargs)
103 if func.__name__ != "init" or is_client_mode_enabled_by_default:
104 return getattr(ray, func.__name__)(*args, **kwargs)
--> 105 return func(*args, **kwargs)
106
107 return wrapper
/opt/conda/lib/python3.7/site-packages/ray/worker.py in get(object_refs, timeout)
1623 worker.core_worker.dump_object_store_memory_usage()
1624 if isinstance(value, RayTaskError):
-> 1625 raise value.as_instanceof_cause()
1626 else:
1627 raise value
RayTaskError(ValueError): ray::_Trainable.train_buffered() (pid=173, ip=172.19.2.2, repr=<tune_sklearn._trainable._Trainable object at 0x7f13ec0fc250>)
File "/opt/conda/lib/python3.7/site-packages/ray/tune/trainable.py", line 224, in train_buffered
result = self.train()
File "/opt/conda/lib/python3.7/site-packages/ray/tune/trainable.py", line 283, in train
result = self.step()
File "/opt/conda/lib/python3.7/site-packages/tune_sklearn/_trainable.py", line 106, in step
return self._train()
File "/opt/conda/lib/python3.7/site-packages/tune_sklearn/_trainable.py", line 247, in _train
error_score="raise")
File "/opt/conda/lib/python3.7/site-packages/sklearn/model_selection/_validation.py", line 283, in cross_validate
for train, test in cv.split(X, y, groups)
File "/opt/conda/lib/python3.7/site-packages/joblib/parallel.py", line 1041, in __call__
if self.dispatch_one_batch(iterator):
File "/opt/conda/lib/python3.7/site-packages/joblib/parallel.py", line 859, in dispatch_one_batch
self._dispatch(tasks)
File "/opt/conda/lib/python3.7/site-packages/joblib/parallel.py", line 777, in _dispatch
job = self._backend.apply_async(batch, callback=cb)
File "/opt/conda/lib/python3.7/site-packages/joblib/_parallel_backends.py", line 208, in apply_async
result = ImmediateResult(func)
File "/opt/conda/lib/python3.7/site-packages/joblib/_parallel_backends.py", line 572, in __init__
self.results = batch()
File "/opt/conda/lib/python3.7/site-packages/joblib/parallel.py", line 263, in __call__
for func, args, kwargs in self.items]
File "/opt/conda/lib/python3.7/site-packages/joblib/parallel.py", line 263, in <listcomp>
for func, args, kwargs in self.items]
File "/opt/conda/lib/python3.7/site-packages/sklearn/utils/fixes.py", line 211, in __call__
return self.function(*args, **kwargs)
File "/opt/conda/lib/python3.7/site-packages/sklearn/model_selection/_validation.py", line 681, in _fit_and_score
estimator.fit(X_train, y_train, **fit_params)
File "/opt/conda/lib/python3.7/site-packages/sklearn/pipeline.py", line 390, in fit
Xt = self._fit(X, y, **fit_params_steps)
File "/opt/conda/lib/python3.7/site-packages/sklearn/pipeline.py", line 355, in _fit
**fit_params_steps[name],
File "/opt/conda/lib/python3.7/site-packages/joblib/memory.py", line 591, in __call__
return self._cached_call(args, kwargs)[0]
File "/opt/conda/lib/python3.7/site-packages/joblib/memory.py", line 534, in _cached_call
out, metadata = self.call(*args, **kwargs)
File "/opt/conda/lib/python3.7/site-packages/joblib/memory.py", line 761, in call
output = self.func(*args, **kwargs)
File "/opt/conda/lib/python3.7/site-packages/sklearn/pipeline.py", line 893, in _fit_transform_one
res = transformer.fit_transform(X, y, **fit_params)
File "/opt/conda/lib/python3.7/site-packages/sklearn/decomposition/_pca.py", line 407, in fit_transform
U, S, Vt = self._fit(X)
File "/opt/conda/lib/python3.7/site-packages/sklearn/decomposition/_pca.py", line 457, in _fit
return self._fit_full(X, n_components)
File "/opt/conda/lib/python3.7/site-packages/sklearn/decomposition/_pca.py", line 478, in _fit_full
"svd_solver='full'" % (n_components, min(n_samples, n_features))
ValueError: n_components=10 must be between 0 and min(n_samples, n_features)=9 with svd_solver='full'
Is there anything I should change when going from GridSearchCV to TuneGridSearchCV??