Hi,
I am running a local install of Ray on a minikube environment on mac. When I try a deployment to Ray Serve (simple hello world endpoint), the deployment works, but I see logging error output that repeats over and over from raylet. This makes it difficult to find any other logging from the deployments. The logs repeat and seem to come from (raylet)
or raylet, ip=<ip_address>
depending on how many nodes are running.
I believe the 2 important things in the logging are:
FileNotFoundError: [Errno 2] No such file or directory: '/sys/fs/cgroup/cpuacct/cpuacct.usage'
and
Message: 'Error computing CPU usage of Ray Kubernetes pod.'
Here the the full log block:
(raylet, ip=172.17.0.7) --- Logging error ---
(raylet, ip=172.17.0.7) Traceback (most recent call last):
(raylet, ip=172.17.0.7) File "/home/ray/anaconda3/lib/python3.7/site-packages/ray/dashboard/k8s_utils.py", line 38, in cpu_percent
(raylet, ip=172.17.0.7) cpu_usage = _cpu_usage()
(raylet, ip=172.17.0.7) File "/home/ray/anaconda3/lib/python3.7/site-packages/ray/dashboard/k8s_utils.py", line 64, in _cpu_usage
(raylet, ip=172.17.0.7) return int(open(CPU_USAGE_PATH).read())
(raylet, ip=172.17.0.7) FileNotFoundError: [Errno 2] No such file or directory: '/sys/fs/cgroup/cpuacct/cpuacct.usage'
(raylet, ip=172.17.0.7)
(raylet, ip=172.17.0.7) During handling of the above exception, another exception occurred:
(raylet, ip=172.17.0.7)
(raylet, ip=172.17.0.7) Traceback (most recent call last):
(raylet, ip=172.17.0.7) File "/home/ray/anaconda3/lib/python3.7/logging/handlers.py", line 69, in emit
(raylet, ip=172.17.0.7) if self.shouldRollover(record):
(raylet, ip=172.17.0.7) File "/home/ray/anaconda3/lib/python3.7/logging/handlers.py", line 185, in shouldRollover
(raylet, ip=172.17.0.7) msg = "%s\n" % self.format(record)
(raylet, ip=172.17.0.7) File "/home/ray/anaconda3/lib/python3.7/logging/__init__.py", line 869, in format
(raylet, ip=172.17.0.7) return fmt.format(record)
(raylet, ip=172.17.0.7) File "/home/ray/anaconda3/lib/python3.7/logging/__init__.py", line 608, in format
(raylet, ip=172.17.0.7) record.message = record.getMessage()
(raylet, ip=172.17.0.7) File "/home/ray/anaconda3/lib/python3.7/logging/__init__.py", line 369, in getMessage
(raylet, ip=172.17.0.7) msg = msg % self.args
(raylet, ip=172.17.0.7) TypeError: not all arguments converted during string formatting
(raylet, ip=172.17.0.7) Call stack:
(raylet, ip=172.17.0.7) File "/home/ray/anaconda3/lib/python3.7/site-packages/ray/dashboard/agent.py", line 376, in <module>
(raylet, ip=172.17.0.7) loop.run_until_complete(agent.run())
(raylet, ip=172.17.0.7) File "/home/ray/anaconda3/lib/python3.7/asyncio/base_events.py", line 574, in run_until_complete
(raylet, ip=172.17.0.7) self.run_forever()
(raylet, ip=172.17.0.7) File "/home/ray/anaconda3/lib/python3.7/asyncio/base_events.py", line 541, in run_forever
(raylet, ip=172.17.0.7) self._run_once()
(raylet, ip=172.17.0.7) File "/home/ray/anaconda3/lib/python3.7/asyncio/base_events.py", line 1786, in _run_once
(raylet, ip=172.17.0.7) handle._run()
(raylet, ip=172.17.0.7) File "/home/ray/anaconda3/lib/python3.7/asyncio/events.py", line 88, in _run
(raylet, ip=172.17.0.7) self._context.run(self._callback, *self._args)
(raylet, ip=172.17.0.7) File "/home/ray/anaconda3/lib/python3.7/site-packages/ray/dashboard/modules/reporter/reporter_agent.py", line 554, in run
(raylet, ip=172.17.0.7) await self._perform_iteration(aioredis_client)
(raylet, ip=172.17.0.7) File "/home/ray/anaconda3/lib/python3.7/site-packages/ray/dashboard/modules/reporter/reporter_agent.py", line 539, in _perform_iteration
(raylet, ip=172.17.0.7) stats = self._get_all_stats()
(raylet, ip=172.17.0.7) File "/home/ray/anaconda3/lib/python3.7/site-packages/ray/dashboard/modules/reporter/reporter_agent.py", line 334, in _get_all_stats
(raylet, ip=172.17.0.7) "cpu": self._get_cpu_percent(),
(raylet, ip=172.17.0.7) File "/home/ray/anaconda3/lib/python3.7/site-packages/ray/dashboard/modules/reporter/reporter_agent.py", line 193, in _get_cpu_percent
(raylet, ip=172.17.0.7) return k8s_utils.cpu_percent()
(raylet, ip=172.17.0.7) File "/home/ray/anaconda3/lib/python3.7/site-packages/ray/dashboard/k8s_utils.py", line 57, in cpu_percent
(raylet, ip=172.17.0.7) logger.exception("Error computing CPU usage of Ray Kubernetes pod.", e)
(raylet, ip=172.17.0.7) Message: 'Error computing CPU usage of Ray Kubernetes pod.'
(raylet, ip=172.17.0.7) Arguments: (FileNotFoundError(2, 'No such file or directory'),)
When I ssh into a node, the file is indeed not there. There is no cpuacct
folder. Here is the ls for the folder /sys/fs/crgroup
(base) ray@ray-cluster-ray-worker-type-dz2mx:/sys/fs/cgroup$ ls -la
total 0
drwxr-xr-x 2 root root 0 Dec 13 11:18 .
drwxr-xr-x 9 root root 0 Dec 13 11:18 ..
-r--r--r-- 1 root root 0 Dec 13 11:18 cgroup.controllers
-r--r--r-- 1 root root 0 Dec 13 11:18 cgroup.events
-rw-r--r-- 1 root root 0 Dec 13 11:18 cgroup.freeze
-rw-r--r-- 1 root root 0 Dec 13 11:18 cgroup.max.depth
-rw-r--r-- 1 root root 0 Dec 13 11:18 cgroup.max.descendants
-rw-r--r-- 1 root root 0 Dec 13 11:18 cgroup.procs
-r--r--r-- 1 root root 0 Dec 13 11:18 cgroup.stat
-rw-r--r-- 1 root root 0 Dec 13 11:18 cgroup.subtree_control
-rw-r--r-- 1 root root 0 Dec 13 11:18 cgroup.threads
-rw-r--r-- 1 root root 0 Dec 13 11:18 cgroup.type
-rw-r--r-- 1 root root 0 Dec 13 11:18 cpu.max
-r--r--r-- 1 root root 0 Dec 13 11:18 cpu.stat
-rw-r--r-- 1 root root 0 Dec 13 11:18 cpu.weight
-rw-r--r-- 1 root root 0 Dec 13 11:18 cpu.weight.nice
-rw-r--r-- 1 root root 0 Dec 13 11:18 cpuset.cpus
-r--r--r-- 1 root root 0 Dec 13 11:18 cpuset.cpus.effective
-rw-r--r-- 1 root root 0 Dec 13 11:18 cpuset.cpus.partition
-rw-r--r-- 1 root root 0 Dec 13 11:18 cpuset.mems
-r--r--r-- 1 root root 0 Dec 13 11:18 cpuset.mems.effective
-r--r--r-- 1 root root 0 Dec 13 11:18 hugetlb.1GB.current
-r--r--r-- 1 root root 0 Dec 13 11:18 hugetlb.1GB.events
-r--r--r-- 1 root root 0 Dec 13 11:18 hugetlb.1GB.events.local
-rw-r--r-- 1 root root 0 Dec 13 11:18 hugetlb.1GB.max
-r--r--r-- 1 root root 0 Dec 13 11:18 hugetlb.1GB.rsvd.current
-rw-r--r-- 1 root root 0 Dec 13 11:18 hugetlb.1GB.rsvd.max
-r--r--r-- 1 root root 0 Dec 13 11:18 hugetlb.2MB.current
-r--r--r-- 1 root root 0 Dec 13 11:18 hugetlb.2MB.events
-r--r--r-- 1 root root 0 Dec 13 11:18 hugetlb.2MB.events.local
-rw-r--r-- 1 root root 0 Dec 13 11:18 hugetlb.2MB.max
-r--r--r-- 1 root root 0 Dec 13 11:18 hugetlb.2MB.rsvd.current
-rw-r--r-- 1 root root 0 Dec 13 11:18 hugetlb.2MB.rsvd.max
-rw-r--r-- 1 root root 0 Dec 13 11:18 io.bfq.weight
-rw-r--r-- 1 root root 0 Dec 13 11:18 io.latency
-rw-r--r-- 1 root root 0 Dec 13 11:18 io.max
-r--r--r-- 1 root root 0 Dec 13 11:18 io.stat
-r--r--r-- 1 root root 0 Dec 13 11:18 memory.current
-r--r--r-- 1 root root 0 Dec 13 11:18 memory.events
-r--r--r-- 1 root root 0 Dec 13 11:18 memory.events.local
-rw-r--r-- 1 root root 0 Dec 13 11:18 memory.high
-rw-r--r-- 1 root root 0 Dec 13 11:18 memory.low
-rw-r--r-- 1 root root 0 Dec 13 11:18 memory.max
-rw-r--r-- 1 root root 0 Dec 13 11:18 memory.min
-rw-r--r-- 1 root root 0 Dec 13 11:18 memory.oom.group
-r--r--r-- 1 root root 0 Dec 13 11:18 memory.stat
-r--r--r-- 1 root root 0 Dec 13 11:18 memory.swap.current
-r--r--r-- 1 root root 0 Dec 13 11:18 memory.swap.events
-rw-r--r-- 1 root root 0 Dec 13 11:18 memory.swap.high
-rw-r--r-- 1 root root 0 Dec 13 11:18 memory.swap.max
-r--r--r-- 1 root root 0 Dec 13 11:18 pids.current
-r--r--r-- 1 root root 0 Dec 13 11:18 pids.events
-rw-r--r-- 1 root root 0 Dec 13 11:18 pids.max
-r--r--r-- 1 root root 0 Dec 13 11:18 rdma.current
-rw-r--r-- 1 root root 0 Dec 13 11:18 rdma.max
Is there any way to correct this logging? Please let me know if you need more info to help out.
System info: Mac OSX 11.6.1
Docker Desktop: 4.3.0
Docker CLI: 20.10.11
Minikube: v1.22.0
Kubernetes: 1.20.7
Ray: 1.9.0 * Basic install from repo, but installed to the namespace ‘ray’