I am currently working with Ray 2.9.1 on Windows 11 to train a Proximal Policy Optimization (PPO) algorithm. However, I have encountered an issue with the Checkpointing mechanism that was not present in the previous versions.
The error message I am encountering is as follows:
62024-01-29 22:20:07,251 ERROR tune_controller.py:1374 -- Trial task failed for trial PPO_EPEnv_e6ca1616
Traceback (most recent call last):
File "C:\Users\grhen\anaconda3\envs\ray291\lib\site-packages\ray\air\execution\_internal\event_manager.py", line 110, in resolve_future
result = ray.get(future)
File "C:\Users\grhen\anaconda3\envs\ray291\lib\site-packages\ray\_private\auto_init_hook.py", line 22, in auto_init_wrapper
return fn(*args, **kwargs)
File "C:\Users\grhen\anaconda3\envs\ray291\lib\site-packages\ray\_private\client_mode_hook.py", line 103, in wrapper
return func(*args, **kwargs)
File "C:\Users\grhen\anaconda3\envs\ray291\lib\site-packages\ray\_private\worker.py", line 2624, in get
raise value.as_instanceof_cause()
ray.exceptions.RayTaskError(FileNotFoundError): ray::PPO.save() (pid=11124, ip=127.0.0.1, actor_id=984bb80f54af807c18b1405e01000000, repr=PPO)
File "python\ray\_raylet.pyx", line 1813, in ray._raylet.execute_task
File "python\ray\_raylet.pyx", line 1754, in ray._raylet.execute_task.function_executor
File "C:\Users\grhen\anaconda3\envs\ray291\lib\site-packages\ray\_private\function_manager.py", line 726, in actor_method_executor
return method(__ray_actor, *args, **kwargs)
File "C:\Users\grhen\anaconda3\envs\ray291\lib\site-packages\ray\util\tracing\tracing_helper.py", line 467, in _resume_span
return method(self, *_args, **_kwargs)
File "C:\Users\grhen\anaconda3\envs\ray291\lib\site-packages\ray\tune\trainable\trainable.py", line 480, in save
persisted_checkpoint = self._storage.persist_current_checkpoint(
File "C:\Users\grhen\anaconda3\envs\ray291\lib\site-packages\ray\train\_internal\storage.py", line 558, in persist_current_checkpoint
_pyarrow_fs_copy_files(
File "C:\Users\grhen\anaconda3\envs\ray291\lib\site-packages\ray\train\_internal\storage.py", line 110, in _pyarrow_fs_copy_files
return pyarrow.fs.copy_files(
File "C:\Users\grhen\anaconda3\envs\ray291\lib\site-packages\pyarrow\fs.py", line 244, in copy_files
_copy_files_selector(source_fs, source_sel,
File "pyarrow\_fs.pyx", line 1229, in pyarrow._fs._copy_files_selector
File "pyarrow\error.pxi", line 110, in pyarrow.lib.check_status
FileNotFoundError: [WinError 206] Cannot create directory 'C:/Users/grhen/ray_results/PPO_2024-01-29_22-11-43/PPO_EPEnv_e6ca1616_1_type=StochasticSampling,disable_action_flattening=False,disable_execution_plan_api=True,disable_initialize_lo_2024-01-29_22-11-43/checkpoint_000000/learner/module_state/default_policy'. Detail: [Windows error 206] The file name or extension is too long.
I am seeking guidance on resolving this issue. Additionally, I am interested in understanding if there is a way to mitigate the excessive information included in the automatically assigned name to the folder. I have attempted to address this by renaming the experiment folder using the air.RunConfig
, but this only modify the experiment directory name.
Any assistance or insights regarding how to rectify this matter would be greatly appreciated.
Thank you.
Best regards, Germán