I set up a local cluster with one head node and 3 workers that connected just fine. However when starting a job with ray tune i get an “Permission denied” error. How to i have to setup the folder or config to make this work?
I started the head node with:
$ ray start --head
and joined the workers with
$ ray start --address='10.1.0.1:6379'
my ray init
ray.init(
address='10.1.0.1:6379',
runtime_env={
"working_dir": ".",
"excludes": [
"venv/",
".ipynb_checkpoints/",
"img/"
],
}
)
tune.run()
The error
(PPO pid=2374, ip=10.1.1.11) 2022-10-12 08:45:49,956 ERROR worker.py:756 -- Exception raised in creation task: The actor died because of an error raised in its creation task, ray::PPO.__init__() (pid=2374, ip=10.1.1.11, repr=PPO)
(PPO pid=2374, ip=10.1.1.11) File "/home/rmlch/.local/lib/python3.10/site-packages/ray/rllib/algorithms/algorithm.py", line 308, in __init__
(PPO pid=2374, ip=10.1.1.11) super().__init__(config=config, logger_creator=logger_creator, **kwargs)
(PPO pid=2374, ip=10.1.1.11) File "/home/rmlch/.local/lib/python3.10/site-packages/ray/tune/trainable/trainable.py", line 132, in __init__
(PPO pid=2374, ip=10.1.1.11) self._create_logger(self.config, logger_creator)
(PPO pid=2374, ip=10.1.1.11) File "/home/rmlch/.local/lib/python3.10/site-packages/ray/tune/trainable/trainable.py", line 823, in _create_logger
(PPO pid=2374, ip=10.1.1.11) self._result_logger = logger_creator(config)
(PPO pid=2374, ip=10.1.1.11) File "/home/rmlch/.local/lib/python3.10/site-packages/ray/tune/execution/ray_trial_executor.py", line 142, in _noop_logger_creator
(PPO pid=2374, ip=10.1.1.11) os.makedirs(logdir, exist_ok=True)
(PPO pid=2374, ip=10.1.1.11) File "/usr/lib/python3.10/os.py", line 215, in makedirs
(PPO pid=2374, ip=10.1.1.11) makedirs(head, exist_ok=exist_ok)
(PPO pid=2374, ip=10.1.1.11) File "/usr/lib/python3.10/os.py", line 215, in makedirs
(PPO pid=2374, ip=10.1.1.11) makedirs(head, exist_ok=exist_ok)
(PPO pid=2374, ip=10.1.1.11) File "/usr/lib/python3.10/os.py", line 215, in makedirs
(PPO pid=2374, ip=10.1.1.11) makedirs(head, exist_ok=exist_ok)
(PPO pid=2374, ip=10.1.1.11) File "/usr/lib/python3.10/os.py", line 225, in makedirs
(PPO pid=2374, ip=10.1.1.11) mkdir(name, mode)
(PPO pid=2374, ip=10.1.1.11) PermissionError: [Errno 13] Permission denied: '/home/tim'
On my nodes the home folder is “/home/rmlch” while on my head node the home folder is “home/tim” . It seems to me that the remote machine is trying to create a log folder but with my head node home path…