Hi there!
I’m trying to tune a YOLOv8 model using Ray. To do this, I’m using the following code:
model = YOLO(model_version)
model.to('cuda:0')
model.tune(data=model_config_file_path, epochs=trial_epochs, batch=0.9, iterations=n_trials, use_ray=True, gpu_per_trial=1)
where
model_version: str = "yolov8n.pt"
n_trials: int = 50
trial_epochs: int = 50
model_config_file_path is a file containing this data
names:
- abcd
nc: 1
test: test/images
train: train/images
val: valid/images
When I run the scripts the following error appears:
(_tune pid=62308) 22 [15, 18, 21] 1 751507 ultralytics.nn.modules.head.Detect [1, [64, 128, 256]]
(_tune pid=62308) Model summary: 225 layers, 3,011,043 parameters, 3,011,027 gradients, 8.2 GFLOPs
(_tune pid=62308)
(_tune pid=62308) Transferred 319/355 items from pretrained weights
(_tune pid=62308) Freezing layer 'model.22.dfl.conv.weight'
(_tune pid=62308) AMP: running Automatic Mixed Precision (AMP) checks with YOLOv8n...
(_tune pid=62308) Downloading https://github.com/ultralytics/assets/releases/download/v8.2.0/yolov8n.pt to 'yolov8n.pt'...
0%| | 0.00/6.25M [00:00<?, ?B/s]
28%|██▊ | 1.75M/6.25M [00:00<00:00, 9.04MB/s]
44%|████▍ | 2.75M/6.25M [00:00<00:00, 9.63MB/s]
60%|██████ | 3.75M/6.25M [00:00<00:00, 9.77MB/s]
76%|███████▌ | 4.75M/6.25M [00:00<00:00, 10.0MB/s]
92%|█████████▏| 5.75M/6.25M [00:00<00:00, 9.95MB/s]
100%|██████████| 6.25M/6.25M [00:00<00:00, 9.87MB/s]
(_tune pid=62308) AMP: checks passed ✅
(_tune pid=62308) AutoBatch: Computing optimal batch size for imgsz=640 at 90.0% CUDA memory utilization.
(_tune pid=62308) AutoBatch: CUDA:0 (NVIDIA GeForce RTX 4080 Laptop GPU) 11.99G total, 0.09G reserved, 0.08G allocated, 11.82G free
(_tune pid=62308) Params GFLOPs GPU_mem (GB) forward (ms) backward (ms) input output
(_tune pid=62308) 3011043 8.194 0.214 13 28.51 (1, 3, 640, 640) list
(_tune pid=62308) 3011043 16.39 0.308 12.01 17.69 (2, 3, 640, 640) list
(_tune pid=62308) 3011043 32.78 0.537 14.03 16.56 (4, 3, 640, 640) list
(_tune pid=62308) 3011043 65.55 1.015 15.66 18.55 (8, 3, 640, 640) list
(_tune pid=62308) 3011043 131.1 2.003 22.33 26.39 (16, 3, 640, 640) list
(_tune pid=62308) AutoBatch: Using batch-size 88 for CUDA:0 10.81G/11.99G (90%) ✅
train: Scanning ******working folder*****\data\processed\train\labels.catrain: Scanning ******working folder*****\data\processed\train\labels.cache... 119 images, 1983 backgrounds, 0 corrupt: 100%|██████████| 2102/2102 [00:00<?, ?it/s]
2024-09-29 16:08:43,180 ERROR tune_controller.py:1331 -- Trial task failed for trial _tune_4fc7c_00000
Traceback (most recent call last):
File "******working folder*****\.venv\Lib\site-packages\ray\air\execution\_internal\event_manager.py", line 110, in resolve_future
result = ray.get(future)
^^^^^^^^^^^^^^^
File "******working folder*****\.venv\Lib\site-packages\ray\_private\auto_init_hook.py", line 21, in auto_init_wrapper
return fn(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^
File "******working folder*****\.venv\Lib\site-packages\ray\_private\client_mode_hook.py", line 103, in wrapper
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "******working folder*****\.venv\Lib\site-packages\ray\_private\worker.py", line 2691, in get
values, debugger_breakpoint = worker.get_objects(object_refs, timeout=timeout)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "******working folder*****\.venv\Lib\site-packages\ray\_private\worker.py", line 871, in get_objects
raise value.as_instanceof_cause()
ray.exceptions.RayTaskError(OSError): ray::ImplicitFunc.train() (pid=62308, ip=127.0.0.1, actor_id=922bada3f118e93ed6f578c401000000, repr=_tune)
File "python\ray\_raylet.pyx", line 1859, in ray._raylet.execute_task
File "python\ray\_raylet.pyx", line 1800, in ray._raylet.execute_task.function_executor
File "******working folder*****\.venv\Lib\site-packages\ray\_private\function_manager.py", line 696, in actor_method_executor
return method(__ray_actor, *args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "******working folder*****\.venv\Lib\site-packages\ray\util\tracing\tracing_helper.py", line 467, in _resume_span
return method(self, *_args, **_kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "******working folder*****\.venv\Lib\site-packages\ray\tune\trainable\trainable.py", line 331, in train
raise skipped from exception_cause(skipped)
File "******working folder*****\.venv\Lib\site-packages\ray\air\_internal\util.py", line 104, in run
self._ret = self._target(*self._args, **self._kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "******working folder*****\.venv\Lib\site-packages\ray\tune\trainable\function_trainable.py", line 45, in <lambda>
training_func=lambda: self._trainable_func(self.config),
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "******working folder*****\.venv\Lib\site-packages\ray\util\tracing\tracing_helper.py", line 467, in _resume_span
return method(self, *_args, **_kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "******working folder*****\.venv\Lib\site-packages\ray\tune\trainable\function_trainable.py", line 250, in _trainable_func
output = fn()
^^^^
File "******working folder*****\.venv\Lib\site-packages\ultralytics\utils\tuner.py", line 103, in _tune
results = model_to_train.train(**config)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "******working folder*****\.venv\Lib\site-packages\ultralytics\engine\model.py", line 803, in train
self.trainer.train()
File "******working folder*****\.venv\Lib\site-packages\ultralytics\engine\trainer.py", line 207, in train
self._do_train(world_size)
File "******working folder*****\.venv\Lib\site-packages\ultralytics\engine\trainer.py", line 327, in _do_train
self._setup_train(world_size)
File "******working folder*****\.venv\Lib\site-packages\ultralytics\engine\trainer.py", line 291, in _setup_train
self.train_loader = self.get_dataloader(self.trainset, batch_size=batch_size, rank=LOCAL_RANK, mode="train")
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "******working folder*****\.venv\Lib\site-packages\ultralytics\models\yolo\detect\train.py", line 55, in get_dataloader
return build_dataloader(dataset, batch_size, workers, shuffle, rank) # return dataloader
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "******working folder*****\.venv\Lib\site-packages\ultralytics\data\build.py", line 135, in build_dataloader
return InfiniteDataLoader(
^^^^^^^^^^^^^^^^^^^
File "******working folder*****\.venv\Lib\site-packages\ultralytics\data\build.py", line 39, in __init__
self.iterator = super().__iter__()
^^^^^^^^^^^^^^^^^^
File "******working folder*****\.venv\Lib\site-packages\torch\utils\data\dataloader.py", line 440, in __iter__
return self._get_iterator()
^^^^^^^^^^^^^^^^^^^^
File "******working folder*****\.venv\Lib\site-packages\torch\utils\data\dataloader.py", line 388, in _get_iterator
return _MultiProcessingDataLoaderIter(self)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "******working folder*****\.venv\Lib\site-packages\torch\utils\data\dataloader.py", line 1038, in __init__
w.start()
File "******local programs folder*****\Python\Python312\Lib\multiprocessing\process.py", line 121, in start
self._popen = self._Popen(self)
^^^^^^^^^^^^^^^^^
File "******local programs folder*****\Python\Python312\Lib\multiprocessing\context.py", line 224, in _Popen
return _default_context.get_context().Process._Popen(process_obj)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "******local programs folder*****\Python\Python312\Lib\multiprocessing\context.py", line 337, in _Popen
return Popen(process_obj)
^^^^^^^^^^^^^^^^^^
File "******local programs folder*****\Python\Python312\Lib\multiprocessing\popen_spawn_win32.py", line 75, in __init__
hp, ht, pid, tid = _winapi.CreateProcess(
^^^^^^^^^^^^^^^^^^^^^^
OSError: [WinError 87] El parámetro no es corrector
I tried to debug the code but I couldn’t get anything useful.
For installing ray I used pip install -U ultralytics "ray[tune]"
Versions / Dependencies
Python version: 3.12.6
OS version: Windows 11 23H2
Installed packages:
aiosignal 1.3.1
alembic 1.13.3
aniso8601 9.0.1
asttokens 2.4.1
attrs 24.2.0
backcall 0.2.0
beautifulsoup4 4.12.3
black 24.8.0
bleach 6.1.0
blinker 1.8.2
cachetools 5.5.0
certifi 2024.8.30
charset-normalizer 3.3.2
click 8.1.7
cloudpickle 3.0.0
colorama 0.4.6
colorlog 6.8.2
comm 0.2.2
contourpy 1.3.0
cycler 0.12.1
databricks-sdk 0.32.3
debugpy 1.8.5
decorator 5.1.1
defusedxml 0.7.1
Deprecated 1.2.14
docker 7.1.0
docker-pycreds 0.4.0
docopt 0.6.2
executing 2.1.0
fastjsonschema 2.20.0
filelock 3.16.1
filetype 1.2.0
flake8 7.1.1
Flask 3.0.3
fonttools 4.53.1
frozenlist 1.4.1
fsspec 2024.9.0
gitdb 4.0.11
GitPython 3.1.43
google-auth 2.35.0
graphene 3.3
graphql-core 3.2.4
graphql-relay 3.2.0
greenlet 3.1.1
idna 3.7
importlib_metadata 8.4.0
ipykernel 6.29.5
ipython 8.12.3
isort 5.13.2
itsdangerous 2.2.0
jedi 0.19.1
Jinja2 3.1.4
joblib 1.4.2
jsonschema 4.23.0
jsonschema-specifications 2023.12.1
jupyter_client 8.6.3
jupyter_core 5.7.2
jupyterlab_pygments 0.3.0
kiwisolver 1.4.7
loguru 0.7.2
Mako 1.3.5
Markdown 3.7
markdown-it-py 3.0.0
MarkupSafe 2.1.5
matplotlib 3.9.2
matplotlib-inline 0.1.7
mccabe 0.7.0
mdurl 0.1.2
mistune 3.0.2
mlflow 2.16.2
mlflow-skinny 2.16.2
mpmath 1.3.0
msgpack 1.1.0
mypy-extensions 1.0.0
nbclient 0.10.0
nbconvert 7.16.4
nbformat 5.10.4
nest-asyncio 1.6.0
networkx 3.3
numpy 1.26.4
opencv-python 4.10.0.84
opencv-python-headless 4.10.0.84
opentelemetry-api 1.27.0
opentelemetry-sdk 1.27.0
opentelemetry-semantic-conventions 0.48b0
optuna 4.0.0
packaging 24.1
pandas 2.2.3
pandocfilters 1.5.1
parso 0.8.4
pathspec 0.12.1
pickleshare 0.7.5
pillow 10.4.0
pip 24.2
pipreqs 0.5.0
platformdirs 4.3.6
prompt_toolkit 3.0.47
protobuf 5.28.2
psutil 6.0.0
pure_eval 0.2.3
py-cpuinfo 9.0.0
pyarrow 17.0.0
pyasn1 0.6.1
pyasn1_modules 0.4.1
pycodestyle 2.12.1
pyflakes 3.2.0
Pygments 2.18.0
pyparsing 3.1.4
python-dateutil 2.9.0.post0
python-dotenv 1.0.1
pytz 2024.2
pywin32 306
PyYAML 6.0.2
pyzmq 26.2.0
ray 2.37.0
referencing 0.35.1
requests 2.32.3
requests-toolbelt 1.0.0
rich 13.8.1
roboflow 1.1.45
rpds-py 0.20.0
rsa 4.9
scikit-learn 1.5.2
scipy 1.14.1
seaborn 0.13.2
sentry-sdk 2.14.0
setproctitle 1.3.3
setuptools 75.1.0
shellingham 1.5.4
six 1.16.0
smmap 5.0.1
soupsieve 2.6
SQLAlchemy 2.0.35
sqlparse 0.5.1
stack-data 0.6.3
sympy 1.13.3
tensorboardX 2.6.2.2
threadpoolctl 3.5.0
tinycss2 1.3.0
torch 2.4.1+cu124
torchaudio 2.4.1+cu124
torchvision 0.19.1+cu124
tornado 6.4.1
tqdm 4.66.5
traitlets 5.14.3
typer 0.12.5
typing_extensions 4.12.2
tzdata 2024.1
ultralytics 8.2.103
ultralytics-thop 2.0.6
urllib3 2.2.3
waitress 3.0.0
wcwidth 0.2.13
webencodings 0.5.1
Werkzeug 3.0.4
win32-setctime 1.1.0
wrapt 1.16.0
yarg 0.1.9
zipp 3.20.2
Reproduction script
from ultralytics import YOLO
model = YOLO(“yolov8n.yaml”).load(“yolov8n.pt”)
results = model.tune(data=“coco8.yaml”, epochs=100, imgsz=640, use_ray=True, gpu_per_trial=1)
Issue Severity
High: It blocks me from completing my task.