How severe does this issue affect your experience of using Ray?
- None: Just asking a question out of curiosity
- Low: It annoys or frustrates me for a moment.
- Medium: It contributes to significant difficulty to complete my task, but I can work around it.
- High: It blocks me to complete my task.
High
I run the “ray up -y default-full.yaml”. Exception occurs When it is initializing command runner [5/7].
Unable to deserialize image_env
to Python object. The image_env
is:
Good morning centos
- Hostname …: sh-prod-aigame-gpu-1
- Release …: CentOS Linux release 7.9.2009 (Core)
- Users …: Currently 2 user(s) logged on
=========================================================================== - Current user …: centos
- CPU usage …: 0.02, 0.05, 0.05 (1, 5, 15 min)
- Memory used …: 1809 MB / 32011 MB
- Swap in use …: 0 MB
- Processes …: 185 running
- System uptime …: 4 days 0 hours 38 minutes 16 seconds
- Disk space SYS …: remaining
- Disk space DATA …: 499G remaining
===========================================================================
[“PATH=/home/ray/anaconda3/bin:/usr/local/nvidia/bin:/usr/local/cuda/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin”,“CUDA_VERSION=11.0.3”,“LD_LIBRARY_PATH=/usr/local/nvidia/lib:/usr/local/nvidia/lib64”,“NVIDIA_VISIBLE_DEVICES=all”,“NVIDIA_DRIVER_CAPABILITIES=compute,utility”,“NVIDIA_REQUIRE_CUDA=cuda>=11.0 brand=tesla,driver>=418,driver<419 brand=tesla,driver>=440,driver<441 brand=tesla,driver>=450,driver<451”,“NCCL_VERSION=2.7.8”,“LIBRARY_PATH=/usr/local/cuda/lib64/stubs”,“CUDNN_VERSION=8.0.4.30”,“TZ=America/Los_Angeles”,“HOME=/home/ray”,“LC_ALL=C.UTF-8”,“LANG=C.UTF-8”]
2023-08-11 09:32:57,103 INFO node_provider.py:116 – ClusterState: Writing cluster state: [‘172.23.1.175’, ‘172.23.0.224’]
New status: update-failed
!!!
Expecting value: line 1 column 1 (char 0)
!!!
Exception in thread Thread-1:
Traceback (most recent call last):
File “/usr/lib64/python3.6/threading.py”, line 916, in _bootstrap_inner
self.run()
File “/usr/local/lib/python3.6/site-packages/ray/autoscaler/_private/updater.py”, line 153, in run
self.do_update()
File “/usr/local/lib/python3.6/site-packages/ray/autoscaler/_private/updater.py”, line 445, in do_update
sync_run_yet=True,
File “/usr/local/lib/python3.6/site-packages/ray/autoscaler/_private/command_runner.py”, line 781, in run_init
raise e
File “/usr/local/lib/python3.6/site-packages/ray/autoscaler/_private/command_runner.py”, line 772, in run_init
for env_var in json.loads(image_env):
File “/usr/lib64/python3.6/json/init.py”, line 354, in loads
return _default_decoder.decode(s)
File “/usr/lib64/python3.6/json/decoder.py”, line 339, in decode
obj, end = self.raw_decode(s, idx=_w(s, 0).end())
File “/usr/lib64/python3.6/json/decoder.py”, line 357, in raw_decode
raise JSONDecodeError(“Expecting value”, s, err.value) from None
json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)
Failed to setup head node.