Permission denied with local cluster

I set up a local cluster with one head node and 3 workers that connected just fine. However when starting a job with ray tune i get an “Permission denied” error. How to i have to setup the folder or config to make this work?

I started the head node with:

$ ray start --head

and joined the workers with

$ ray start --address='10.1.0.1:6379'

my ray init

ray.init(
    address='10.1.0.1:6379',
    runtime_env={
        "working_dir": ".",
        "excludes": [
            "venv/",
            ".ipynb_checkpoints/",
            "img/"
        ],
    }
)
tune.run()

The error

(PPO pid=2374, ip=10.1.1.11) 2022-10-12 08:45:49,956	ERROR worker.py:756 -- Exception raised in creation task: The actor died because of an error raised in its creation task, ray::PPO.__init__() (pid=2374, ip=10.1.1.11, repr=PPO)
(PPO pid=2374, ip=10.1.1.11)   File "/home/rmlch/.local/lib/python3.10/site-packages/ray/rllib/algorithms/algorithm.py", line 308, in __init__
(PPO pid=2374, ip=10.1.1.11)     super().__init__(config=config, logger_creator=logger_creator, **kwargs)
(PPO pid=2374, ip=10.1.1.11)   File "/home/rmlch/.local/lib/python3.10/site-packages/ray/tune/trainable/trainable.py", line 132, in __init__
(PPO pid=2374, ip=10.1.1.11)     self._create_logger(self.config, logger_creator)
(PPO pid=2374, ip=10.1.1.11)   File "/home/rmlch/.local/lib/python3.10/site-packages/ray/tune/trainable/trainable.py", line 823, in _create_logger
(PPO pid=2374, ip=10.1.1.11)     self._result_logger = logger_creator(config)
(PPO pid=2374, ip=10.1.1.11)   File "/home/rmlch/.local/lib/python3.10/site-packages/ray/tune/execution/ray_trial_executor.py", line 142, in _noop_logger_creator
(PPO pid=2374, ip=10.1.1.11)     os.makedirs(logdir, exist_ok=True)
(PPO pid=2374, ip=10.1.1.11)   File "/usr/lib/python3.10/os.py", line 215, in makedirs
(PPO pid=2374, ip=10.1.1.11)     makedirs(head, exist_ok=exist_ok)
(PPO pid=2374, ip=10.1.1.11)   File "/usr/lib/python3.10/os.py", line 215, in makedirs
(PPO pid=2374, ip=10.1.1.11)     makedirs(head, exist_ok=exist_ok)
(PPO pid=2374, ip=10.1.1.11)   File "/usr/lib/python3.10/os.py", line 215, in makedirs
(PPO pid=2374, ip=10.1.1.11)     makedirs(head, exist_ok=exist_ok)
(PPO pid=2374, ip=10.1.1.11)   File "/usr/lib/python3.10/os.py", line 225, in makedirs
(PPO pid=2374, ip=10.1.1.11)     mkdir(name, mode)
(PPO pid=2374, ip=10.1.1.11) PermissionError: [Errno 13] Permission denied: '/home/tim'

On my nodes the home folder is “/home/rmlch” while on my head node the home folder is “home/tim” . It seems to me that the remote machine is trying to create a log folder but with my head node home path…

So when giving sudo rights to the worker nodes they created “/home/tim/ray_results” on each node and the error went away. Is it supposed to work this way? What would be the correct way without sudo rights?

When you start ray, can you set this "--temp-dir", to a different home path and try again?

@sangcho Thanks for the hint. I tried to initialize the session with these paramaters:

ray.init(_temp_dir='/home/test/')
ray.init(temp_dir='/home/test/')

The first one gives back an argument error (on Ray 2.0) the second one just didn’t seem to change anything. I now made one of my 3 worker nodes the head node now so it works because all have the same username.

You should specify it when you do ray start. This seems to work in the master

ray start --temp-dir /tmp/s --head
1 Like

Also when I tried, ray.init(_temp_dir='/home/test/'), this also works. what’s the error message?

@sangcho Thanks for the help. It works now. Now i only a problem with some custom C++ libs my env is using. Not a ray problem anymore though.

1 Like

Sounds good! Lmk if you need any other help :)!

1 Like