I am currently working with Ray 2.9.1 on Windows 11 to train a Proximal Policy Optimization (PPO) algorithm. However, I have encountered an issue with the Checkpointing mechanism that was not present in the previous versions.
The error message I am encountering is as follows:
62024-01-29 22:20:07,251 ERROR tune_controller.py:1374 -- Trial task failed for trial PPO_EPEnv_e6ca1616
Traceback (most recent call last):
File "C:\Users\grhen\anaconda3\envs\ray291\lib\site-packages\ray\air\execution\_internal\event_manager.py", line 110, in resolve_future
result = ray.get(future)
File "C:\Users\grhen\anaconda3\envs\ray291\lib\site-packages\ray\_private\auto_init_hook.py", line 22, in auto_init_wrapper
return fn(*args, **kwargs)
File "C:\Users\grhen\anaconda3\envs\ray291\lib\site-packages\ray\_private\client_mode_hook.py", line 103, in wrapper
return func(*args, **kwargs)
File "C:\Users\grhen\anaconda3\envs\ray291\lib\site-packages\ray\_private\worker.py", line 2624, in get
raise value.as_instanceof_cause()
ray.exceptions.RayTaskError(FileNotFoundError): ray::PPO.save() (pid=11124, ip=127.0.0.1, actor_id=984bb80f54af807c18b1405e01000000, repr=PPO)
File "python\ray\_raylet.pyx", line 1813, in ray._raylet.execute_task
File "python\ray\_raylet.pyx", line 1754, in ray._raylet.execute_task.function_executor
File "C:\Users\grhen\anaconda3\envs\ray291\lib\site-packages\ray\_private\function_manager.py", line 726, in actor_method_executor
return method(__ray_actor, *args, **kwargs)
File "C:\Users\grhen\anaconda3\envs\ray291\lib\site-packages\ray\util\tracing\tracing_helper.py", line 467, in _resume_span
return method(self, *_args, **_kwargs)
File "C:\Users\grhen\anaconda3\envs\ray291\lib\site-packages\ray\tune\trainable\trainable.py", line 480, in save
persisted_checkpoint = self._storage.persist_current_checkpoint(
File "C:\Users\grhen\anaconda3\envs\ray291\lib\site-packages\ray\train\_internal\storage.py", line 558, in persist_current_checkpoint
_pyarrow_fs_copy_files(
File "C:\Users\grhen\anaconda3\envs\ray291\lib\site-packages\ray\train\_internal\storage.py", line 110, in _pyarrow_fs_copy_files
return pyarrow.fs.copy_files(
File "C:\Users\grhen\anaconda3\envs\ray291\lib\site-packages\pyarrow\fs.py", line 244, in copy_files
_copy_files_selector(source_fs, source_sel,
File "pyarrow\_fs.pyx", line 1229, in pyarrow._fs._copy_files_selector
File "pyarrow\error.pxi", line 110, in pyarrow.lib.check_status
FileNotFoundError: [WinError 206] Cannot create directory 'C:/Users/grhen/ray_results/PPO_2024-01-29_22-11-43/PPO_EPEnv_e6ca1616_1_type=StochasticSampling,disable_action_flattening=False,disable_execution_plan_api=True,disable_initialize_lo_2024-01-29_22-11-43/checkpoint_000000/learner/module_state/default_policy'. Detail: [Windows error 206] The file name or extension is too long.
I am seeking guidance on resolving this issue. Additionally, I am interested in understanding if there is a way to mitigate the excessive information included in the automatically assigned name to the folder. I have attempted to address this by renaming the experiment folder using the air.RunConfig, but this only modify the experiment directory name.
Any assistance or insights regarding how to rectify this matter would be greatly appreciated.
Thank you.
Best regards, Germán