How severe does this issue affect your experience of using Ray?
- Medium: It contributes to significant difficulty to complete my task, but I can work around it.
Hi everyone,
I am stuck trying to debug a problem while training a PPO agent with a custom environment and would appreciate any help.
(PPOTrainer pid=5252) 2022-08-05 12:03:54,860 ERROR worker.py:451 -- Exception raised in creation task: The actor died because of an error raised in its creation task, ray::PPOTrainer.__init__() (pid=5252, ip=129.132.4.157, repr=PPOTrainer)
(PPOTrainer pid=5252) File "/home/sem22h2/.conda/envs/RL/lib/python3.10/site-packages/ray/rllib/agents/trainer.py", line 1074, in _init
(PPOTrainer pid=5252) raise NotImplementedError
(PPOTrainer pid=5252) NotImplementedError
(PPOTrainer pid=5252)
(PPOTrainer pid=5252) During handling of the above exception, another exception occurred:
(PPOTrainer pid=5252)
(PPOTrainer pid=5252) ray::PPOTrainer.__init__() (pid=5252, ip=129.132.4.157, repr=PPOTrainer)
(PPOTrainer pid=5252) File "/home/sem22h2/.conda/envs/RL/lib/python3.10/site-packages/ray/rllib/agents/trainer.py", line 870, in __init__
(PPOTrainer pid=5252) super().__init__(
(PPOTrainer pid=5252) File "/home/sem22h2/.conda/envs/RL/lib/python3.10/site-packages/ray/tune/trainable.py", line 156, in __init__
(PPOTrainer pid=5252) self.setup(copy.deepcopy(self.config))
(PPOTrainer pid=5252) File "/home/sem22h2/.conda/envs/RL/lib/python3.10/site-packages/ray/rllib/agents/trainer.py", line 950, in setup
(PPOTrainer pid=5252) self.workers = WorkerSet(
(PPOTrainer pid=5252) File "/home/sem22h2/.conda/envs/RL/lib/python3.10/site-packages/ray/rllib/evaluation/worker_set.py", line 170, in __init__
(PPOTrainer pid=5252) self._local_worker = self._make_worker(
(PPOTrainer pid=5252) File "/home/sem22h2/.conda/envs/RL/lib/python3.10/site-packages/ray/rllib/evaluation/worker_set.py", line 630, in _make_worker
(PPOTrainer pid=5252) worker = cls(
(PPOTrainer pid=5252) File "/home/sem22h2/.conda/envs/RL/lib/python3.10/site-packages/ray/rllib/evaluation/rollout_worker.py", line 630, in __init__
(PPOTrainer pid=5252) self._build_policy_map(
(PPOTrainer pid=5252) File "/home/sem22h2/.conda/envs/RL/lib/python3.10/site-packages/ray/rllib/evaluation/rollout_worker.py", line 1788, in _build_policy_map
(PPOTrainer pid=5252) self.policy_map.create_policy(
(PPOTrainer pid=5252) File "/home/sem22h2/.conda/envs/RL/lib/python3.10/site-packages/ray/rllib/policy/policy_map.py", line 152, in create_policy
(PPOTrainer pid=5252) self[policy_id] = class_(observation_space, action_space, merged_config)
(PPOTrainer pid=5252) File "/home/sem22h2/.conda/envs/RL/lib/python3.10/site-packages/ray/rllib/agents/ppo/ppo_torch_policy.py", line 59, in __init__
(PPOTrainer pid=5252) self._initialize_loss_from_dummy_batch()
(PPOTrainer pid=5252) File "/home/sem22h2/.conda/envs/RL/lib/python3.10/site-packages/ray/rllib/policy/policy.py", line 904, in _initialize_loss_from_dummy_batch
(PPOTrainer pid=5252) actions, state_outs, extra_outs = self.compute_actions_from_input_dict(
(PPOTrainer pid=5252) File "/home/sem22h2/.conda/envs/RL/lib/python3.10/site-packages/ray/rllib/policy/torch_policy.py", line 335, in compute_actions_from_input_dict
(PPOTrainer pid=5252) return self._compute_action_helper(
(PPOTrainer pid=5252) File "/home/sem22h2/.conda/envs/RL/lib/python3.10/site-packages/ray/rllib/utils/threading.py", line 21, in wrapper
(PPOTrainer pid=5252) return func(self, *a, **k)
(PPOTrainer pid=5252) File "/home/sem22h2/.conda/envs/RL/lib/python3.10/site-packages/ray/rllib/policy/torch_policy.py", line 997, in _compute_action_helper
(PPOTrainer pid=5252) dist_inputs, state_out = self.model(input_dict, state_batches, seq_lens)
(PPOTrainer pid=5252) File "/home/sem22h2/.conda/envs/RL/lib/python3.10/site-packages/ray/rllib/models/modelv2.py", line 259, in __call__
(PPOTrainer pid=5252) res = self.forward(restored, state or [], seq_lens)
(PPOTrainer pid=5252) File "/home/sem22h2/.conda/envs/RL/lib/python3.10/site-packages/ray/rllib/models/torch/complex_input_net.py", line 201, in forward
(PPOTrainer pid=5252) nn_out, _ = self.flatten[i](
(PPOTrainer pid=5252) File "/home/sem22h2/.conda/envs/RL/lib/python3.10/site-packages/ray/rllib/models/modelv2.py", line 259, in __call__
(PPOTrainer pid=5252) res = self.forward(restored, state or [], seq_lens)
(PPOTrainer pid=5252) File "/home/sem22h2/.conda/envs/RL/lib/python3.10/site-packages/ray/rllib/models/torch/fcnet.py", line 146, in forward
(PPOTrainer pid=5252) self._features = self._hidden_layers(self._last_flat_in)
(PPOTrainer pid=5252) File "/home/sem22h2/.conda/envs/RL/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
(PPOTrainer pid=5252) return forward_call(*input, **kwargs)
(PPOTrainer pid=5252) File "/home/sem22h2/.conda/envs/RL/lib/python3.10/site-packages/torch/nn/modules/container.py", line 139, in forward
(PPOTrainer pid=5252) input = module(input)
(PPOTrainer pid=5252) File "/home/sem22h2/.conda/envs/RL/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
(PPOTrainer pid=5252) return forward_call(*input, **kwargs)
(PPOTrainer pid=5252) File "/home/sem22h2/.conda/envs/RL/lib/python3.10/site-packages/ray/rllib/models/torch/misc.py", line 164, in forward
(PPOTrainer pid=5252) return self._model(x)
(PPOTrainer pid=5252) File "/home/sem22h2/.conda/envs/RL/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
(PPOTrainer pid=5252) return forward_call(*input, **kwargs)
(PPOTrainer pid=5252) File "/home/sem22h2/.conda/envs/RL/lib/python3.10/site-packages/torch/nn/modules/container.py", line 139, in forward
(PPOTrainer pid=5252) input = module(input)
(PPOTrainer pid=5252) File "/home/sem22h2/.conda/envs/RL/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
(PPOTrainer pid=5252) return forward_call(*input, **kwargs)
(PPOTrainer pid=5252) File "/home/sem22h2/.conda/envs/RL/lib/python3.10/site-packages/torch/nn/modules/linear.py", line 114, in forward
(PPOTrainer pid=5252) return F.linear(input, self.weight, self.bias)
(PPOTrainer pid=5252) RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cpu and cuda:0! (when checking argument for argument mat1 in method wrapper_addmm)
2022-08-05 12:03:54,875 ERROR trial_runner.py:886 -- Trial PPO_NasBench201_e466b_00000: Error processing event.
I start the training with the following code, where NasBench201 is my custom environment.
ray.init(ignore_reinit_error=True)
ray.tune.run(
"PPO",
stop={"training_iteration": 100},
config={
"env": NasBench201,
"framework": "torch",
"num_cpus_per_worker": 1,
"log_level": "INFO",
"horizon":1000,
"num_gpus": 1,
"num_workers": 1,
"render_env": False
},
# local_dir="logs",
callbacks=[WandbLoggerCallback(api_key="xxxxx", project="RayNasBenchV0")],
)
I also stumbled upon this issue here [Bug] RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cpu and cuda:0! (when checking argument for argument mat1 in method wrapper_addmm) · Issue #21921 · ray-project/ray · GitHub. However, this should have been fixed. Furthermore, I am not using a multi-dimensional action-/observation- space.
Custom Env
no_ops = 4
num_triu = 6
self.observation_space = spaces.MultiBinary(no_ops * num_triu)
self.action_space = spaces.Discrete(no_ops * num_triu)
I am using ray 1.13.0. Any input would be highly appreciated. I am not even sure how to start debugging this issue. What I did till now is only try to comment out a few lines in my custom environment which might be causing this.
Best regards,
Dan