1. Severity of the issue: (select one)
High: Completely blocks me.
2. Environment:
- Ray version: 2.44.1
- Python version: 3.11
- OS: Linux & Windows
- Cloud/Infrastructure: Local Ray instance on colab
- Other libs/tools (if relevant):
3. What happened vs. what you expected:
- Expected: I was saving a checkpoint for a DQN config. I trained the algorithm on an environment with available gpu. i want to test it visually ( using SUMO simulator ) so i needed to restore the model locally. expected it to work out of the box.
- Actual: getting this error - File “C:\Users\ilai\AppData\Local\pypoetry\Cache\virtualenvs\rl-tsc-2025-WnPmI87f-py3.11\Lib\site-packages\ray\rllib\utils\framework.py”, line 88, in get_device
assert config.local_gpu_idx < torch.cuda.device_count(), (
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
AssertionError: local_gpu_idx 0 is not a valid GPU ID or is not available,
as it expects a GPU to be available where this config is ran ( had a different error message on other ray versions but same problem )
code is basic,
algo = Algorithm.from_checkpoint(checkpoint_path),
where checkpoint paths refers to a local checkpoint - “/path/to/checkpoints/checkpoint_xxx”
and checkpoint dir looking as follows -
On earlier ray versions i could bypass it by editing the pkl of the algorithm_state but on this version it fails. ( had a different error after solving this issue with the other version so tried updating to the latest version, of course model is trained on the same ray version )
Hey @Ilai_Dabush, welcome to the forums!
I have a few questions for you:
- You mentioned you trained the policy with an available GPU, does your current setup (venv) have a GPU?
- Can you try the below?
import torch
print(torch.cuda.is_available(), torch.cuda.device_count())
This will show you if torch is detecting your GPU/s. Depending on what the answer is to the above this will help with troubleshooting and helping you. Essentially, I think from reading this assertion on the RLLIB side, it is checking to see if you have a GPU available or not (from the below).
assert config.local_gpu_idx < torch.cuda.device_count(), (
f"local_gpu_idx {config.local_gpu_idx} is not a valid GPU ID "
"or is not available."
)
# This is an index into the available CUDA devices. For example, if
# `os.environ["CUDA_VISIBLE_DEVICES"] = "1"` then
# `torch.cuda.device_count() = 1` and torch.device(0) maps to that GPU
# with ID=1 on the node.
return torch.device(config.local_gpu_idx)
Thanks!
Tyler
Hey @tlaurie99,
That’s exactly the problem, my local machine has no GPU. I wanted to run some inference locally and for that I don’t really need GPU. For now my workaround is to not use GPU for training too as I don’t train large networks but rather very small ones. It would be nice if it could be supported to run inference on a machine without GPU even if for train GPU was configured and used.
Thanks,
Ilai
Hey @Ilai_Dabush,
Okay great that is good to know. Sorry, I think I misunderstood at first!
So, I believe you should be able to do the following. This just takes your config, changes the num_gpus to 0 and then builds the algorithm with what you trained with.
config = DQNConfig.from_checkpoint(checkpoint_path)
config = config.resources(num_gpus=0)
algo = config.build()
algo.restore(checkpoint_path)
)
#now have access to algo.evaluate()
Let me know if this works for you – I am not at a place where I can run this to check, but I will later today when I get back.
Tyler
Hey @tlaurie99,
It is indeed what you would expect to work. Problem is - It won’t let you load the DQN config to a machine with no GPU if the checkpoint was trained on gpu. It simply won’t pass that line and throw some error regarding GPU isn’t available (I’ve also tried a workaround of editing the pkl files that hold the configurations, but to no avail)
Thanks,
Ilai