Ray init tries to detect TPUs even when they aren't present

How severe does this issue affect your experience of using Ray?

  • High: It blocks me to complete my task.

I recently upgraded my ray version to 2.7.0. Thereafter, when I try to start ray using ray.init() function, it throws this error :

Traceback (most recent call last):
File “”, line 1, in
File “/home/mritunjay/miniconda3/lib/python3.8/site-packages/ray/_private/client_mode_hook.py”, line 103, in wrapper
return func(*args, **kwargs)
File “/home/mritunjay/miniconda3/lib/python3.8/site-packages/ray/_private/worker.py”, line 1536, in init
_global_node = ray._private.node.Node(
File “/home/mritunjay/miniconda3/lib/python3.8/site-packages/ray/_private/node.py”, line 310, in init
self.start_ray_processes()
File “/home/mritunjay/miniconda3/lib/python3.8/site-packages/ray/_private/node.py”, line 1452, in start_ray_processes
resource_spec = self.get_resource_spec()
File “/home/mritunjay/miniconda3/lib/python3.8/site-packages/ray/_private/node.py”, line 540, in get_resource_spec
self._resource_spec = ResourceSpec(
File “/home/mritunjay/miniconda3/lib/python3.8/site-packages/ray/_private/resource_spec.py”, line 204, in resolve
accelerator.update_resources_with_accelerator_type(resources)
File “/home/mritunjay/miniconda3/lib/python3.8/site-packages/ray/_private/accelerator.py”, line 39, in update_resources_with_accelerator_type
accelerator_type=_autodetect_tpu_version(),
File “/home/mritunjay/miniconda3/lib/python3.8/site-packages/ray/_private/accelerator.py”, line 214, in _autodetect_tpu_version
return accelerator_type_to_version(accelerator_type_request.text)
File “/home/mritunjay/miniconda3/lib/python3.8/site-packages/ray/_private/accelerator.py”, line 197, in accelerator_type_to_version
assert_tpu_accelerator_type(accelerator_type)
File “/home/mritunjay/miniconda3/lib/python3.8/site-packages/ray/_private/accelerator.py”, line 239, in assert_tpu_accelerator_type
raise ValueError(
ValueError: acceleratorType should match v(generation)-(cores/chips). Got .

The machine I am using has 48 CPU cores and doesn’t have any GPUs/TPUs. Moreover, I get this error sometimes and sometimes not. How to resolve it?

Do you happen to use GKE and have the env var set? RAY_GKE_TPU_ACCELERATOR_TYPE_ENV_VAR?

I am not using GKE. I was trying to start ray cluster on my local machine. I also don’t have the env var RAY_GKE_TPU_ACCELERATOR_TYPE_ENV_VAR set.

Can you tell me what’s the output of this script in your local machine?

RAY_GCE_TPU_ACCELERATOR_ENDPOINT = (
    "http://metadata.google.internal/computeMetadata/"
    "v1/instance/attributes/accelerator-type"
)
RAY_GCE_TPU_HEADERS = {"Metadata-Flavor": "Google"}
import requests
accelerator_type_request = requests.get(
    RAY_GCE_TPU_ACCELERATOR_ENDPOINT,
    headers=RAY_GCE_TPU_HEADERS,
)

Looks like this code has been added as a way to support TPU accelerator detection, and it may have a bug

I’m experiencing the same issue. The error message popped after finishing previous task. I ran the script you provided. Here is my output:

<Response [200]>

I also tried ray.shutdown() and then ray.init(). The issue persisted. Please guide on how to resolve.

I fixed the problem by reinstalling ray in my conda environment. It is unclear what triggered the issue above.

Well, the same problem rose after finishing a new task and trying to start the second one.

@sangcho should we open an issue for this if this is reproducible?

I think it is already fixed Soften ValueError for TPU autodetection by allenwang28 · Pull Request #39922 · ray-project/ray · GitHub

@Shufan_Zhang can you try the latest master and see if you have the same issue? This fix will be included in ray 2.7.1 (planning to be released on 10/9)