~/.conda/envs/py373_cuda/lib/python3.7/site-packages/ray/rllib/policy/torch_policy.py in __init__(self, observation_space, action_space, config, model, loss, action_distribution_class, action_sampler_fn, action_distribution_fn, max_seq_len, get_batch_divisibility_req)
153 for i, id_ in enumerate(gpu_ids) if i < config["num_gpus"]
154 ]
--> 155 self.device = self.devices[0]
156 ids = [
157 id_ for i, id_ in enumerate(gpu_ids) if i < config["num_gpus"]
IndexError: list index out of range
And I am running Ray v1.4. The relevant snippet of code (when compared to the github fix)
if config["_fake_gpus"] or config["num_gpus"] == 0 or \
not torch.cuda.is_available():
I am not setting num_gpus or _fake_gpus in the config and not is_available will be False as well. So, the control is passed to the else part of that block where it is failing.
returns “True” for cuda and this at the end of the the Traceback:
File "/home/bukovskiy/.local/lib/python3.8/site-packages/ray/rllib/policy/torch_policy.py", line 159, in __init__
self.device = self.devices[0]
IndexError: list index out of range
referring to these lines in torch_policy.py:
self.devices = [
torch.device("cuda:{}".format(i))
for i, id_ in enumerate(gpu_ids) if i < config["num_gpus"]
]
self.device = self.devices[0]
Hey everyone (@ironv , @Guy_Tennenholtz , @Vladimir_Uspenskii , @michaelzhiluo ), thanks for this discussion and surfacing these issues.
We did make lots of improvements on the multi-GPU/GPU frontier recently and a lot of these bugs should be fixed by now in the current master.
We also deployed nightly 2-GPU learning tests for all major algos and both tf and torch. We’ll add LSTM=True 2-GPU tests for all RNN-supporting algos in the next 1-2 weeks as well.