Hello!
I am attempting to use the SklearnTrainer provided by the Ray library to train a machine learning model. However, when calling the fit method on the trainer object, an error is raised. It’s worth mentioning that prior to this snippet, I did not encounter any errors.
Any suggestions to solve it?
Thanks
Code snippet:
trainer = SklearnTrainer(
estimator=RandomForestRegressor(),
label_column=“label”,
scaling_config=ray.air.config.ScalingConfig(
trainer_resources={“CPU”: 4}
)
, datasets={“train”: train_dataset, “test”: test_dataset}
, cv=cv
, parallelize_cv=True
, scoring=scoring
)
result = trainer.fit()
Error message:
An error was encountered:
The Ray Train run failed. Please inspect the previous error messages for a cause. After fixing the issue (assuming that the error is not caused by your own application logic, but rather an error such as OOM), you can restart the run from scratch or continue this run.
To continue this run, you can use: trainer = SklearnTrainer.restore("/home/ray_results/SklearnTrainer_2023-07-11_11-13-19")
.
To start a new run that will retry on training failures, set air.RunConfig(failure_config=air.FailureConfig(max_failures))
in the Trainer’s run_config
with max_failures > 0
, or max_failures = -1
for unlimited retries.
Traceback (most recent call last):
File “/home/hadoop/venv/lib64/python3.7/site-packages/ray/train/base_trainer.py”, line 618, in fit
) from result.error
ray.train.base_trainer.TrainingFailedError: The Ray Train run failed. Please inspect the previous error messages for a cause. After fixing the issue (assuming that the error is not caused by your own application logic, but rather an error such as OOM), you can restart the run from scratch or continue this run.
To continue this run, you can use: trainer = SklearnTrainer.restore("/home/ray_results/SklearnTrainer_2023-07-11_11-13-19")
.
To start a new run that will retry on training failures, set air.RunConfig(failure_config=air.FailureConfig(max_failures))
in the Trainer’s run_config
with max_failures > 0
, or max_failures = -1
for unlimited retries.