Tunning YOLOv8 model using Ray: OSError: [WinError 87]

Hi there!

I’m trying to tune a YOLOv8 model using Ray. To do this, I’m using the following code:

model = YOLO(model_version)
model.to('cuda:0')

model.tune(data=model_config_file_path, epochs=trial_epochs, batch=0.9, iterations=n_trials, use_ray=True, gpu_per_trial=1)

where

model_version: str = "yolov8n.pt"
n_trials: int = 50
trial_epochs: int = 50

model_config_file_path is a file containing this data

names:
- abcd
nc: 1
test: test/images
train: train/images
val: valid/images

When I run the scripts the following error appears:

(_tune pid=62308)  22        [15, 18, 21]  1    751507  ultralytics.nn.modules.head.Detect           [1, [64, 128, 256]]
(_tune pid=62308) Model summary: 225 layers, 3,011,043 parameters, 3,011,027 gradients, 8.2 GFLOPs
(_tune pid=62308)
(_tune pid=62308) Transferred 319/355 items from pretrained weights
(_tune pid=62308) Freezing layer 'model.22.dfl.conv.weight'
(_tune pid=62308) AMP: running Automatic Mixed Precision (AMP) checks with YOLOv8n...
(_tune pid=62308) Downloading https://github.com/ultralytics/assets/releases/download/v8.2.0/yolov8n.pt to 'yolov8n.pt'...
  0%|          | 0.00/6.25M [00:00<?, ?B/s]
 28%|██▊       | 1.75M/6.25M [00:00<00:00, 9.04MB/s]
 44%|████▍     | 2.75M/6.25M [00:00<00:00, 9.63MB/s]
 60%|██████    | 3.75M/6.25M [00:00<00:00, 9.77MB/s]
 76%|███████▌  | 4.75M/6.25M [00:00<00:00, 10.0MB/s]
 92%|█████████▏| 5.75M/6.25M [00:00<00:00, 9.95MB/s]
100%|██████████| 6.25M/6.25M [00:00<00:00, 9.87MB/s]
(_tune pid=62308) AMP: checks passed ✅
(_tune pid=62308) AutoBatch: Computing optimal batch size for imgsz=640 at 90.0% CUDA memory utilization.
(_tune pid=62308) AutoBatch: CUDA:0 (NVIDIA GeForce RTX 4080 Laptop GPU) 11.99G total, 0.09G reserved, 0.08G allocated, 11.82G free
(_tune pid=62308)       Params      GFLOPs  GPU_mem (GB)  forward (ms) backward (ms)                   input                  output
(_tune pid=62308)      3011043       8.194         0.214            13         28.51        (1, 3, 640, 640)                    list
(_tune pid=62308)      3011043       16.39         0.308         12.01         17.69        (2, 3, 640, 640)                    list
(_tune pid=62308)      3011043       32.78         0.537         14.03         16.56        (4, 3, 640, 640)                    list
(_tune pid=62308)      3011043       65.55         1.015         15.66         18.55        (8, 3, 640, 640)                    list
(_tune pid=62308)      3011043       131.1         2.003         22.33         26.39       (16, 3, 640, 640)                    list
(_tune pid=62308) AutoBatch: Using batch-size 88 for CUDA:0 10.81G/11.99G (90%) ✅
train: Scanning ******working folder*****\data\processed\train\labels.catrain: Scanning ******working folder*****\data\processed\train\labels.cache... 119 images, 1983 backgrounds, 0 corrupt: 100%|██████████| 2102/2102 [00:00<?, ?it/s]
2024-09-29 16:08:43,180 ERROR tune_controller.py:1331 -- Trial task failed for trial _tune_4fc7c_00000
Traceback (most recent call last):
  File "******working folder*****\.venv\Lib\site-packages\ray\air\execution\_internal\event_manager.py", line 110, in resolve_future
    result = ray.get(future)
             ^^^^^^^^^^^^^^^
  File "******working folder*****\.venv\Lib\site-packages\ray\_private\auto_init_hook.py", line 21, in auto_init_wrapper
    return fn(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^
  File "******working folder*****\.venv\Lib\site-packages\ray\_private\client_mode_hook.py", line 103, in wrapper
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "******working folder*****\.venv\Lib\site-packages\ray\_private\worker.py", line 2691, in get
    values, debugger_breakpoint = worker.get_objects(object_refs, timeout=timeout)
                                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "******working folder*****\.venv\Lib\site-packages\ray\_private\worker.py", line 871, in get_objects
    raise value.as_instanceof_cause()
ray.exceptions.RayTaskError(OSError): ray::ImplicitFunc.train() (pid=62308, ip=127.0.0.1, actor_id=922bada3f118e93ed6f578c401000000, repr=_tune)        
  File "python\ray\_raylet.pyx", line 1859, in ray._raylet.execute_task
  File "python\ray\_raylet.pyx", line 1800, in ray._raylet.execute_task.function_executor
  File "******working folder*****\.venv\Lib\site-packages\ray\_private\function_manager.py", line 696, in actor_method_executor
    return method(__ray_actor, *args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "******working folder*****\.venv\Lib\site-packages\ray\util\tracing\tracing_helper.py", line 467, in _resume_span
    return method(self, *_args, **_kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "******working folder*****\.venv\Lib\site-packages\ray\tune\trainable\trainable.py", line 331, in train
    raise skipped from exception_cause(skipped)
  File "******working folder*****\.venv\Lib\site-packages\ray\air\_internal\util.py", line 104, in run
    self._ret = self._target(*self._args, **self._kwargs)
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "******working folder*****\.venv\Lib\site-packages\ray\tune\trainable\function_trainable.py", line 45, in <lambda>
    training_func=lambda: self._trainable_func(self.config),
                          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "******working folder*****\.venv\Lib\site-packages\ray\util\tracing\tracing_helper.py", line 467, in _resume_span
    return method(self, *_args, **_kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "******working folder*****\.venv\Lib\site-packages\ray\tune\trainable\function_trainable.py", line 250, in _trainable_func
    output = fn()
             ^^^^
  File "******working folder*****\.venv\Lib\site-packages\ultralytics\utils\tuner.py", line 103, in _tune
    results = model_to_train.train(**config)
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "******working folder*****\.venv\Lib\site-packages\ultralytics\engine\model.py", line 803, in train
    self.trainer.train()
  File "******working folder*****\.venv\Lib\site-packages\ultralytics\engine\trainer.py", line 207, in train
    self._do_train(world_size)
  File "******working folder*****\.venv\Lib\site-packages\ultralytics\engine\trainer.py", line 327, in _do_train
    self._setup_train(world_size)
  File "******working folder*****\.venv\Lib\site-packages\ultralytics\engine\trainer.py", line 291, in _setup_train
    self.train_loader = self.get_dataloader(self.trainset, batch_size=batch_size, rank=LOCAL_RANK, mode="train")
                        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "******working folder*****\.venv\Lib\site-packages\ultralytics\models\yolo\detect\train.py", line 55, in get_dataloader
    return build_dataloader(dataset, batch_size, workers, shuffle, rank)  # return dataloader
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "******working folder*****\.venv\Lib\site-packages\ultralytics\data\build.py", line 135, in build_dataloader
    return InfiniteDataLoader(
           ^^^^^^^^^^^^^^^^^^^
  File "******working folder*****\.venv\Lib\site-packages\ultralytics\data\build.py", line 39, in __init__
    self.iterator = super().__iter__()
                    ^^^^^^^^^^^^^^^^^^
  File "******working folder*****\.venv\Lib\site-packages\torch\utils\data\dataloader.py", line 440, in __iter__
    return self._get_iterator()
           ^^^^^^^^^^^^^^^^^^^^
  File "******working folder*****\.venv\Lib\site-packages\torch\utils\data\dataloader.py", line 388, in _get_iterator
    return _MultiProcessingDataLoaderIter(self)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "******working folder*****\.venv\Lib\site-packages\torch\utils\data\dataloader.py", line 1038, in __init__
    w.start()
  File "******local programs folder*****\Python\Python312\Lib\multiprocessing\process.py", line 121, in start
    self._popen = self._Popen(self)
                  ^^^^^^^^^^^^^^^^^
  File "******local programs folder*****\Python\Python312\Lib\multiprocessing\context.py", line 224, in _Popen
    return _default_context.get_context().Process._Popen(process_obj)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "******local programs folder*****\Python\Python312\Lib\multiprocessing\context.py", line 337, in _Popen
    return Popen(process_obj)
           ^^^^^^^^^^^^^^^^^^
  File "******local programs folder*****\Python\Python312\Lib\multiprocessing\popen_spawn_win32.py", line 75, in __init__
    hp, ht, pid, tid = _winapi.CreateProcess(
                       ^^^^^^^^^^^^^^^^^^^^^^
OSError: [WinError 87] El parámetro no es corrector

I tried to debug the code but I couldn’t get anything useful.

For installing ray I used pip install -U ultralytics "ray[tune]"

Versions / Dependencies

Python version: 3.12.6
OS version: Windows 11 23H2

Installed packages:
aiosignal 1.3.1
alembic 1.13.3
aniso8601 9.0.1
asttokens 2.4.1
attrs 24.2.0
backcall 0.2.0
beautifulsoup4 4.12.3
black 24.8.0
bleach 6.1.0
blinker 1.8.2
cachetools 5.5.0
certifi 2024.8.30
charset-normalizer 3.3.2
click 8.1.7
cloudpickle 3.0.0
colorama 0.4.6
colorlog 6.8.2
comm 0.2.2
contourpy 1.3.0
cycler 0.12.1
databricks-sdk 0.32.3
debugpy 1.8.5
decorator 5.1.1
defusedxml 0.7.1
Deprecated 1.2.14
docker 7.1.0
docker-pycreds 0.4.0
docopt 0.6.2
executing 2.1.0
fastjsonschema 2.20.0
filelock 3.16.1
filetype 1.2.0
flake8 7.1.1
Flask 3.0.3
fonttools 4.53.1
frozenlist 1.4.1
fsspec 2024.9.0
gitdb 4.0.11
GitPython 3.1.43
google-auth 2.35.0
graphene 3.3
graphql-core 3.2.4
graphql-relay 3.2.0
greenlet 3.1.1
idna 3.7
importlib_metadata 8.4.0
ipykernel 6.29.5
ipython 8.12.3
isort 5.13.2
itsdangerous 2.2.0
jedi 0.19.1
Jinja2 3.1.4
joblib 1.4.2
jsonschema 4.23.0
jsonschema-specifications 2023.12.1
jupyter_client 8.6.3
jupyter_core 5.7.2
jupyterlab_pygments 0.3.0
kiwisolver 1.4.7
loguru 0.7.2
Mako 1.3.5
Markdown 3.7
markdown-it-py 3.0.0
MarkupSafe 2.1.5
matplotlib 3.9.2
matplotlib-inline 0.1.7
mccabe 0.7.0
mdurl 0.1.2
mistune 3.0.2
mlflow 2.16.2
mlflow-skinny 2.16.2
mpmath 1.3.0
msgpack 1.1.0
mypy-extensions 1.0.0
nbclient 0.10.0
nbconvert 7.16.4
nbformat 5.10.4
nest-asyncio 1.6.0
networkx 3.3
numpy 1.26.4
opencv-python 4.10.0.84
opencv-python-headless 4.10.0.84
opentelemetry-api 1.27.0
opentelemetry-sdk 1.27.0
opentelemetry-semantic-conventions 0.48b0
optuna 4.0.0
packaging 24.1
pandas 2.2.3
pandocfilters 1.5.1
parso 0.8.4
pathspec 0.12.1
pickleshare 0.7.5
pillow 10.4.0
pip 24.2
pipreqs 0.5.0
platformdirs 4.3.6
prompt_toolkit 3.0.47
protobuf 5.28.2
psutil 6.0.0
pure_eval 0.2.3
py-cpuinfo 9.0.0
pyarrow 17.0.0
pyasn1 0.6.1
pyasn1_modules 0.4.1
pycodestyle 2.12.1
pyflakes 3.2.0
Pygments 2.18.0
pyparsing 3.1.4
python-dateutil 2.9.0.post0
python-dotenv 1.0.1
pytz 2024.2
pywin32 306
PyYAML 6.0.2
pyzmq 26.2.0
ray 2.37.0
referencing 0.35.1
requests 2.32.3
requests-toolbelt 1.0.0
rich 13.8.1
roboflow 1.1.45
rpds-py 0.20.0
rsa 4.9
scikit-learn 1.5.2
scipy 1.14.1
seaborn 0.13.2
sentry-sdk 2.14.0
setproctitle 1.3.3
setuptools 75.1.0
shellingham 1.5.4
six 1.16.0
smmap 5.0.1
soupsieve 2.6
SQLAlchemy 2.0.35
sqlparse 0.5.1
stack-data 0.6.3
sympy 1.13.3
tensorboardX 2.6.2.2
threadpoolctl 3.5.0
tinycss2 1.3.0
torch 2.4.1+cu124
torchaudio 2.4.1+cu124
torchvision 0.19.1+cu124
tornado 6.4.1
tqdm 4.66.5
traitlets 5.14.3
typer 0.12.5
typing_extensions 4.12.2
tzdata 2024.1
ultralytics 8.2.103
ultralytics-thop 2.0.6
urllib3 2.2.3
waitress 3.0.0
wcwidth 0.2.13
webencodings 0.5.1
Werkzeug 3.0.4
win32-setctime 1.1.0
wrapt 1.16.0
yarg 0.1.9
zipp 3.20.2

Reproduction script

from ultralytics import YOLO

model = YOLO(“yolov8n.yaml”).load(“yolov8n.pt”)
results = model.tune(data=“coco8.yaml”, epochs=100, imgsz=640, use_ray=True, gpu_per_trial=1)

Issue Severity

High: It blocks me from completing my task.

Is this you logging the same bug on Github here? [Ray + YOLOv8] YOLOv8 model.tune · Issue #47859 · ray-project/ray · GitHub

Hi Sam. Thank you for answering!

Yes, it is the same bug. I also reported it on Git Hub.