I am getting the following error sometimes when running distributed PPO:
RayTaskError(FileNotFoundError): ray::RolloutWorker.par_iter_next() (pid=8351, ip=10.2.230.146)
File “python/ray/_raylet.pyx”, line 446, in ray._raylet.execute_task
File “/home/ubuntu/conda/envs/venv/lib/python3.8/site-packages/ray/memory_monitor.py”, line 135, in raise_if_low_memory
used_gb, total_gb = self.get_memory_usage()
File “/home/ubuntu/conda/envs/venv/lib/python3.8/site-packages/ray/memory_monitor.py”, line 106, in get_memory_usage
psutil_mem = psutil.virtual_memory()
File “/home/ubuntu/conda/envs/venv/lib/python3.8/site-packages/ray/thirdparty_files/psutil/init.py”, line 1983, in virtual_memory
ret = _psplatform.virtual_memory()
File “/home/ubuntu/conda/envs/venv/lib/python3.8/site-packages/ray/thirdparty_files/psutil/_pslinux.py”, line 391, in virtual_memory
with open_binary(’%s/meminfo’ % get_procfs_path()) as f:
File “/home/ubuntu/conda/envs/venv/lib/python3.8/site-packages/ray/thirdparty_files/psutil/_common.py”, line 713, in open_binary
return open(fname, “rb”, **kwargs)
FileNotFoundError: [Errno 2] No such file or directory: ‘/proc/meminfo’
This has happened with plenty of memory left available on the machines - and sometimes it happens after running only for a very short amount of time.
I haven’t been able to find any other mention of this issue here except for https://github.com/ray-project/ray/issues/4474.