Error computing CPU usage of Ray Kubernetes pod - Docker Desktop

Hi Team, need your help with this.

I am deploying ray on Kubernetes running out of Docker Desktop. I have followed example mentioned at Deploying Ray Serve — Ray v2.0.0.dev0

Here is error logged while ray submit.

❯ ray submit example-full.yaml deploy.py
2022-01-06 18:07:22,800	INFO util.py:282 -- setting max workers for head node type to 0
Loaded cached provider configuration
If you experience issues with the cloud provider, try re-running the command with --no-config-cache.
2022-01-06 18:07:25,511	INFO util.py:282 -- setting max workers for head node type to 0
2022-01-06 18:07:25,531	INFO command_runner.py:172 -- NodeUpdater: example-cluster-ray-head-type-ccmcx: Running kubectl -n ray exec -it example-cluster-ray-head-type-ccmcx -- bash --login -c -i 'true && source ~/.bashrc && export OMP_NUM_THREADS=1 PYTHONWARNINGS=ignore && (python ~/deploy.py)'
2022-01-06 18:07:27,416	INFO worker.py:843 -- Connecting to existing Ray cluster at address: 10.1.0.40:6379
(ServeController pid=676) 2022-01-06 18:07:28,921	INFO checkpoint_path.py:16 -- Using RayInternalKVStore for controller checkpoint and recovery.
(ServeController pid=676) 2022-01-06 18:07:28,930	INFO http_state.py:101 -- Starting HTTP proxy with name 'SERVE_CONTROLLER_ACTOR:SERVE_PROXY_ACTOR-node:10.1.0.40-0' on node 'node:10.1.0.40-0' listening on '0.0.0.0:8000'
(raylet, ip=10.1.0.41) --- Logging error ---
(raylet, ip=10.1.0.41) Traceback (most recent call last):
(raylet, ip=10.1.0.41)   File "/home/ray/anaconda3/lib/python3.7/site-packages/ray/dashboard/k8s_utils.py", line 38, in cpu_percent
(raylet, ip=10.1.0.41)     cpu_usage = _cpu_usage()
(raylet, ip=10.1.0.41)   File "/home/ray/anaconda3/lib/python3.7/site-packages/ray/dashboard/k8s_utils.py", line 64, in _cpu_usage
(raylet, ip=10.1.0.41)     return int(open(CPU_USAGE_PATH).read())
(raylet, ip=10.1.0.41) FileNotFoundError: [Errno 2] No such file or directory: '/sys/fs/cgroup/cpuacct/cpuacct.usage'
(raylet, ip=10.1.0.41)
(raylet, ip=10.1.0.41) During handling of the above exception, another exception occurred:
(raylet, ip=10.1.0.41)
(raylet, ip=10.1.0.41) Traceback (most recent call last):
(raylet, ip=10.1.0.41)   File "/home/ray/anaconda3/lib/python3.7/logging/handlers.py", line 69, in emit
(raylet, ip=10.1.0.41)     if self.shouldRollover(record):
(raylet, ip=10.1.0.41)   File "/home/ray/anaconda3/lib/python3.7/logging/handlers.py", line 185, in shouldRollover
(raylet, ip=10.1.0.41)     msg = "%s\n" % self.format(record)
(raylet, ip=10.1.0.41)   File "/home/ray/anaconda3/lib/python3.7/logging/__init__.py", line 869, in format
(raylet, ip=10.1.0.41)     return fmt.format(record)
(raylet, ip=10.1.0.41)   File "/home/ray/anaconda3/lib/python3.7/logging/__init__.py", line 608, in format
(raylet, ip=10.1.0.41)     record.message = record.getMessage()
(raylet, ip=10.1.0.41)   File "/home/ray/anaconda3/lib/python3.7/logging/__init__.py", line 369, in getMessage
(raylet, ip=10.1.0.41)     msg = msg % self.args
(raylet, ip=10.1.0.41) TypeError: not all arguments converted during string formatting
(raylet, ip=10.1.0.41) Call stack:
(raylet, ip=10.1.0.41)   File "/home/ray/anaconda3/lib/python3.7/site-packages/ray/dashboard/agent.py", line 376, in <module>
(raylet, ip=10.1.0.41)     loop.run_until_complete(agent.run())
(raylet, ip=10.1.0.41)   File "/home/ray/anaconda3/lib/python3.7/asyncio/base_events.py", line 574, in run_until_complete
(raylet, ip=10.1.0.41)     self.run_forever()
(raylet, ip=10.1.0.41)   File "/home/ray/anaconda3/lib/python3.7/asyncio/base_events.py", line 541, in run_forever
(raylet, ip=10.1.0.41)     self._run_once()
(raylet, ip=10.1.0.41)   File "/home/ray/anaconda3/lib/python3.7/asyncio/base_events.py", line 1786, in _run_once
(raylet, ip=10.1.0.41)     handle._run()
(raylet, ip=10.1.0.41)   File "/home/ray/anaconda3/lib/python3.7/asyncio/events.py", line 88, in _run
(raylet, ip=10.1.0.41)     self._context.run(self._callback, *self._args)
(raylet, ip=10.1.0.41)   File "/home/ray/anaconda3/lib/python3.7/site-packages/ray/dashboard/modules/reporter/reporter_agent.py", line 554, in run
(raylet, ip=10.1.0.41)     await self._perform_iteration(aioredis_client)
(raylet, ip=10.1.0.41)   File "/home/ray/anaconda3/lib/python3.7/site-packages/ray/dashboard/modules/reporter/reporter_agent.py", line 539, in _perform_iteration
(raylet, ip=10.1.0.41)     stats = self._get_all_stats()
(raylet, ip=10.1.0.41)   File "/home/ray/anaconda3/lib/python3.7/site-packages/ray/dashboard/modules/reporter/reporter_agent.py", line 334, in _get_all_stats
(raylet, ip=10.1.0.41)     "cpu": self._get_cpu_percent(),
(raylet, ip=10.1.0.41)   File "/home/ray/anaconda3/lib/python3.7/site-packages/ray/dashboard/modules/reporter/reporter_agent.py", line 193, in _get_cpu_percent
(raylet, ip=10.1.0.41)     return k8s_utils.cpu_percent()
(raylet, ip=10.1.0.41)   File "/home/ray/anaconda3/lib/python3.7/site-packages/ray/dashboard/k8s_utils.py", line 57, in cpu_percent
(raylet, ip=10.1.0.41)     logger.exception("Error computing CPU usage of Ray Kubernetes pod.", e)
(raylet, ip=10.1.0.41) Message: 'Error computing CPU usage of Ray Kubernetes pod.'
(raylet, ip=10.1.0.41) Arguments: (FileNotFoundError(2, 'No such file or directory'),)



Docker Desktop - v4.3.2
Kubernetes - v1.22.4
Mac OS - v12.1

Thanks in advance!

Hi @lihost, this is a known issue that’s being tracked here. It seems to be an issue with local K8s setups because another user reported it on minikube.

Hi @shrekris , thank you for the updates.

While this is being resolved, would you suggest any workaround to this so that we can test our deployments before moving to prod?

Turns out there’s another thread solving the same error: Logging error computing CPU usage of Ray Kubernetes Please take a look there!

Yes, I have put in place the work-around - Downgrading Docker Desktop to 4.2.0