Exception when ray up

How severe does this issue affect your experience of using Ray?

  • None: Just asking a question out of curiosity
  • Low: It annoys or frustrates me for a moment.
  • Medium: It contributes to significant difficulty to complete my task, but I can work around it.
  • High: It blocks me to complete my task.
    High

I run the “ray up -y default-full.yaml”. Exception occurs When it is initializing command runner [5/7].
Unable to deserialize image_env to Python object. The image_env is:
Good morning centos

  • Hostname …: sh-prod-aigame-gpu-1
  • Release …: CentOS Linux release 7.9.2009 (Core)
  • Users …: Currently 2 user(s) logged on
    ===========================================================================
  • Current user …: centos
  • CPU usage …: 0.02, 0.05, 0.05 (1, 5, 15 min)
  • Memory used …: 1809 MB / 32011 MB
  • Swap in use …: 0 MB
  • Processes …: 185 running
  • System uptime …: 4 days 0 hours 38 minutes 16 seconds
  • Disk space SYS …: remaining
  • Disk space DATA …: 499G remaining
    ===========================================================================

[“PATH=/home/ray/anaconda3/bin:/usr/local/nvidia/bin:/usr/local/cuda/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin”,“CUDA_VERSION=11.0.3”,“LD_LIBRARY_PATH=/usr/local/nvidia/lib:/usr/local/nvidia/lib64”,“NVIDIA_VISIBLE_DEVICES=all”,“NVIDIA_DRIVER_CAPABILITIES=compute,utility”,“NVIDIA_REQUIRE_CUDA=cuda>=11.0 brand=tesla,driver>=418,driver<419 brand=tesla,driver>=440,driver<441 brand=tesla,driver>=450,driver<451”,“NCCL_VERSION=2.7.8”,“LIBRARY_PATH=/usr/local/cuda/lib64/stubs”,“CUDNN_VERSION=8.0.4.30”,“TZ=America/Los_Angeles”,“HOME=/home/ray”,“LC_ALL=C.UTF-8”,“LANG=C.UTF-8”]
2023-08-11 09:32:57,103 INFO node_provider.py:116 – ClusterState: Writing cluster state: [‘172.23.1.175’, ‘172.23.0.224’]
New status: update-failed
!!!
Expecting value: line 1 column 1 (char 0)
!!!

Exception in thread Thread-1:
Traceback (most recent call last):
File “/usr/lib64/python3.6/threading.py”, line 916, in _bootstrap_inner
self.run()
File “/usr/local/lib/python3.6/site-packages/ray/autoscaler/_private/updater.py”, line 153, in run
self.do_update()
File “/usr/local/lib/python3.6/site-packages/ray/autoscaler/_private/updater.py”, line 445, in do_update
sync_run_yet=True,
File “/usr/local/lib/python3.6/site-packages/ray/autoscaler/_private/command_runner.py”, line 781, in run_init
raise e
File “/usr/local/lib/python3.6/site-packages/ray/autoscaler/_private/command_runner.py”, line 772, in run_init
for env_var in json.loads(image_env):
File “/usr/lib64/python3.6/json/init.py”, line 354, in loads
return _default_decoder.decode(s)
File “/usr/lib64/python3.6/json/decoder.py”, line 339, in decode
obj, end = self.raw_decode(s, idx=_w(s, 0).end())
File “/usr/lib64/python3.6/json/decoder.py”, line 357, in raw_decode
raise JSONDecodeError(“Expecting value”, s, err.value) from None
json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)

Failed to setup head node.

I am also getting this error, any update on this?

I am also getting this error, any updates?

Also getting this error. Updates would be extremely helpful!

Note that:

Good morning centos

Hostname …: sh-prod-aigame-gpu-1
Release …: CentOS Linux release 7.9.2009 (Core)
Users …: Currently 2 user(s) logged on
===========================================================================
Current user …: centos
CPU usage …: 0.02, 0.05, 0.05 (1, 5, 15 min)
Memory used …: 1809 MB / 32011 MB
Swap in use …: 0 MB
Processes …: 185 running
System uptime …: 4 days 0 hours 38 minutes 16 seconds
Disk space SYS …: remaining
Disk space DATA …: 499G remaining
===========================================================================

is stated to be part of image_env. For some reason, the inspection of the environment variables is yielding centos boilerplate. You want image_env to exclusively be

[“PATH=/home/ray/anaconda3/bin:/usr/local/nvidia/bin:/usr/local/cuda/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin”,“CUDA_VERSION=11.0.3”,“LD_LIBRARY_PATH=/usr/local/nvidia/lib:/usr/local/nvidia/lib64”,“NVIDIA_VISIBLE_DEVICES=all”,“NVIDIA_DRIVER_CAPABILITIES=compute,utility”,“NVIDIA_REQUIRE_CUDA=cuda>=11.0 brand=tesla,driver>=418,driver<419 brand=tesla,driver>=440,driver<441 brand=tesla,driver>=450,driver<451”,“NCCL_VERSION=2.7.8”,“LIBRARY_PATH=/usr/local/cuda/lib64/stubs”,“CUDNN_VERSION=8.0.4.30”,“TZ=America/Los_Angeles”,“HOME=/home/ray”,“LC_ALL=C.UTF-8”,“LANG=C.UTF-8”]

I was running into a similar problem that was resolved by adding lines to the initialization_commands of my yaml to ensure that there were no unexpected prefixes to the JSON string.

I had the same issue. It turns out that my problem was that the image_env got some error messages from the bash shell prepended to it, which caused the json decoder to fail. I resolved it by following command line - Error message when opening the terminal "-bash: /usr/bin/tclsh: No such file or directory" - Ask Ubuntu