Proc/Meminfo Error Distributed PPO

Hello,

I am getting the following error sometimes when running distributed PPO:

RayTaskError(FileNotFoundError): ray::RolloutWorker.par_iter_next() (pid=8351, ip=10.2.230.146)
File “python/ray/_raylet.pyx”, line 446, in ray._raylet.execute_task
File “/home/ubuntu/conda/envs/venv/lib/python3.8/site-packages/ray/memory_monitor.py”, line 135, in raise_if_low_memory
used_gb, total_gb = self.get_memory_usage()
File “/home/ubuntu/conda/envs/venv/lib/python3.8/site-packages/ray/memory_monitor.py”, line 106, in get_memory_usage
psutil_mem = psutil.virtual_memory()
File “/home/ubuntu/conda/envs/venv/lib/python3.8/site-packages/ray/thirdparty_files/psutil/init.py”, line 1983, in virtual_memory
ret = _psplatform.virtual_memory()
File “/home/ubuntu/conda/envs/venv/lib/python3.8/site-packages/ray/thirdparty_files/psutil/_pslinux.py”, line 391, in virtual_memory
with open_binary(‘%s/meminfo’ % get_procfs_path()) as f:
File “/home/ubuntu/conda/envs/venv/lib/python3.8/site-packages/ray/thirdparty_files/psutil/_common.py”, line 713, in open_binary
return open(fname, “rb”, **kwargs)
FileNotFoundError: [Errno 2] No such file or directory: ‘/proc/meminfo’

This has happened with plenty of memory left available on the machines - and sometimes it happens after running only for a very short amount of time.

I haven’t been able to find any other mention of this issue here except for https://github.com/ray-project/ray/issues/4474.

Thanks

This is most likely a machine/OS problem, as /proc/meminfo should usually be available. Can you share a bit more about your setup? Is this a cloud instance? Are you running in a cluster? Are you using the Ray cluster launcher? Does Ray run in a docker container or another kind of virtualization?

1 Like

Yes it’s a AWS EC2 setup, using m5.16x instances. We are using the Ray cluster launcher for this - and Ray isn’t run in a docker etc.

I think a relevant fix for this was reported a couple months ago:

1 Like

Are there any potential downsides from setting RAY_DEBUG_DISABLE_MEMORY_MONITOR=1 to avoid this code path entirely?