Sorry, I forgot the stack trace.
2022-02-25 20:17:26,644 ERROR actor.py:745 -- Exception raised in creation task: The actor died because of an error raised in its creation task, ray::RolloutWorker.__init__() (pid=353970, ip=172.17.0.2)
File "/home/docker/miniconda3/lib/python3.9/site-packages/ray/rllib/evaluation/rollout_worker.py", line 580, in __init__
self._build_policy_map(
File "/home/docker/miniconda3/lib/python3.9/site-packages/ray/rllib/evaluation/rollout_worker.py", line 1375, in _build_policy_map
self.policy_map.create_policy(name, orig_cls, obs_space, act_space,
File "/home/docker/miniconda3/lib/python3.9/site-packages/ray/rllib/policy/policy_map.py", line 136, in create_policy
self[policy_id] = class_(observation_space, action_space,
File "/home/docker/miniconda3/lib/python3.9/site-packages/ray/rllib/policy/policy_template.py", line 279, in __init__
self._initialize_loss_from_dummy_batch(
File "/home/docker/miniconda3/lib/python3.9/site-packages/ray/rllib/policy/policy.py", line 750, in _initialize_loss_from_dummy_batch
self.compute_actions_from_input_dict(
File "/home/docker/miniconda3/lib/python3.9/site-packages/ray/rllib/policy/torch_policy.py", line 299, in compute_actions_from_input_dict
return self._compute_action_helper(input_dict, state_batches,
File "/home/docker/miniconda3/lib/python3.9/site-packages/ray/rllib/utils/threading.py", line 21, in wrapper
return func(self, *a, **k)
File "/home/docker/miniconda3/lib/python3.9/site-packages/ray/rllib/policy/torch_policy.py", line 363, in _compute_action_helper
dist_inputs, state_out = self.model(input_dict, state_batches,
File "/home/docker/miniconda3/lib/python3.9/site-packages/ray/rllib/models/modelv2.py", line 230, in __call__
res = self.forward(restored, state or [], seq_lens)
File "/home/docker/miniconda3/lib/python3.9/site-packages/ray/rllib/models/torch/recurrent_net.py", line 83, in forward
output, new_state = self.forward_rnn(inputs, state, seq_lens)
File "/home/docker/project_models/project_models/ray_alphastar/alphastar.py", line 217, in forward_rnn
out, (next_h, next_c) = self.lstm(
File "/home/docker/miniconda3/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl
return forward_call(*input, **kwargs)
File "/home/docker/miniconda3/lib/python3.9/site-packages/torch/nn/modules/rnn.py", line 677, in forward
self.check_forward_args(input, hx, batch_sizes)
File "/home/docker/miniconda3/lib/python3.9/site-packages/torch/nn/modules/rnn.py", line 621, in check_forward_args
self.check_hidden_size(hidden[0], self.get_expected_hidden_size(input, batch_sizes),
File "/home/docker/miniconda3/lib/python3.9/site-packages/torch/nn/modules/rnn.py", line 226, in check_hidden_size
raise RuntimeError(msg.format(expected_hidden_size, list(hx.size())))
RuntimeError: Expected hidden[0] size (1, 1, 512), got [1, 32, 512]
2022-02-25 20:17:26,648 WARNING worker.py:498 -- `ray.get_gpu_ids()` will always return the empty list when called from the driver. This is because Ray does not manage GPU allocations to the driver process.
2022-02-25 20:17:26,650 WARNING worker.py:498 -- `ray.get_gpu_ids()` will always return the empty list when called from the driver. This is because Ray does not manage GPU allocations to the driver process.
2022-02-25 20:17:26,652 WARNING rollout_worker.py:574 -- You are running ray with `local_mode=True`, but have configured 1 GPUs to be used! In local mode, Policies are placed on the CPU and the `num_gpus` setting is ignored.
2022-02-25 20:17:41,042 ERROR actor.py:745 -- Exception raised in creation task: The actor died because of an error raised in its creation task, ray::ConditionalPPOTorchTrainer.__init__() (pid=353970, ip=172.17.0.2)
File "/home/docker/miniconda3/lib/python3.9/site-packages/ray/rllib/agents/trainer_template.py", line 136, in __init__
Trainer.__init__(self, config, env, logger_creator)
File "/home/docker/miniconda3/lib/python3.9/site-packages/ray/rllib/agents/trainer.py", line 592, in __init__
super().__init__(config, logger_creator)
File "/home/docker/miniconda3/lib/python3.9/site-packages/ray/tune/trainable.py", line 103, in __init__
self.setup(copy.deepcopy(self.config))
File "/home/docker/miniconda3/lib/python3.9/site-packages/ray/rllib/agents/trainer_template.py", line 146, in setup
super().setup(config)
File "/home/docker/miniconda3/lib/python3.9/site-packages/ray/rllib/agents/trainer.py", line 739, in setup
self._init(self.config, self.env_creator)
File "/home/docker/miniconda3/lib/python3.9/site-packages/ray/rllib/agents/trainer_template.py", line 170, in _init
self.workers = self._make_workers(
File "/home/docker/miniconda3/lib/python3.9/site-packages/ray/rllib/agents/trainer.py", line 821, in _make_workers
return WorkerSet(
File "/home/docker/miniconda3/lib/python3.9/site-packages/ray/rllib/evaluation/worker_set.py", line 103, in __init__
self._local_worker = self._make_worker(
File "/home/docker/miniconda3/lib/python3.9/site-packages/ray/rllib/evaluation/worker_set.py", line 399, in _make_worker
worker = cls(
File "/home/docker/miniconda3/lib/python3.9/site-packages/ray/rllib/evaluation/rollout_worker.py", line 580, in __init__
self._build_policy_map(
File "/home/docker/miniconda3/lib/python3.9/site-packages/ray/rllib/evaluation/rollout_worker.py", line 1375, in _build_policy_map
self.policy_map.create_policy(name, orig_cls, obs_space, act_space,
File "/home/docker/miniconda3/lib/python3.9/site-packages/ray/rllib/policy/policy_map.py", line 136, in create_policy
self[policy_id] = class_(observation_space, action_space,
File "/home/docker/miniconda3/lib/python3.9/site-packages/ray/rllib/policy/policy_template.py", line 279, in __init__
self._initialize_loss_from_dummy_batch(
File "/home/docker/miniconda3/lib/python3.9/site-packages/ray/rllib/policy/policy.py", line 750, in _initialize_loss_from_dummy_batch
self.compute_actions_from_input_dict(
File "/home/docker/miniconda3/lib/python3.9/site-packages/ray/rllib/policy/torch_policy.py", line 299, in compute_actions_from_input_dict
return self._compute_action_helper(input_dict, state_batches,
File "/home/docker/miniconda3/lib/python3.9/site-packages/ray/rllib/utils/threading.py", line 21, in wrapper
return func(self, *a, **k)
File "/home/docker/miniconda3/lib/python3.9/site-packages/ray/rllib/policy/torch_policy.py", line 363, in _compute_action_helper
dist_inputs, state_out = self.model(input_dict, state_batches,
File "/home/docker/miniconda3/lib/python3.9/site-packages/ray/rllib/models/modelv2.py", line 230, in __call__
res = self.forward(restored, state or [], seq_lens)
File "/home/docker/miniconda3/lib/python3.9/site-packages/ray/rllib/models/torch/recurrent_net.py", line 83, in forward
output, new_state = self.forward_rnn(inputs, state, seq_lens)
File "/home/docker/project_models/project_models/ray_alphastar/alphastar.py", line 217, in forward_rnn
out, (next_h, next_c) = self.lstm(
File "/home/docker/miniconda3/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl
return forward_call(*input, **kwargs)
File "/home/docker/miniconda3/lib/python3.9/site-packages/torch/nn/modules/rnn.py", line 677, in forward
self.check_forward_args(input, hx, batch_sizes)
File "/home/docker/miniconda3/lib/python3.9/site-packages/torch/nn/modules/rnn.py", line 621, in check_forward_args
self.check_hidden_size(hidden[0], self.get_expected_hidden_size(input, batch_sizes),
File "/home/docker/miniconda3/lib/python3.9/site-packages/torch/nn/modules/rnn.py", line 226, in check_hidden_size
raise RuntimeError(msg.format(expected_hidden_size, list(hx.size())))
RuntimeError: Expected hidden[0] size (1, 1, 512), got [1, 32, 512]
2022-02-25 20:17:41,045 WARNING worker.py:498 -- `ray.get_gpu_ids()` will always return the empty list when called from the driver. This is because Ray does not manage GPU allocations to the driver process.
2022-02-25 20:17:41,587 ERROR syncer.py:72 -- Log sync requires rsync to be installed.
== Status ==
Memory usage on this node: 11.5/15.4 GiB
Using FIFO scheduling algorithm.
Resources requested: 2.0/2 CPUs, 1.0/1 GPUs, 0.0/7.63 GiB heap, 0.0/3.81 GiB objects (0.0/1.0 accelerator_type:G)
Result logdir: /workspaces/projects/project/ray_results/QPR2/experiments
Number of trials: 1/1 (1 RUNNING)
+-----------------------------------------------------------+----------+-------+--------------------+--------------------+
| Trial name | status | loc | Custom Metrics 1 | Custom Metrics 2 |
|-----------------------------------------------------------+----------+-------+--------------------+--------------------|
| ConditionalPPOTorchTrainer_HNS_RLLib_Agent_v1_eafce_00000 | RUNNING | | | |
+-----------------------------------------------------------+----------+-------+--------------------+--------------------+
2022-02-25 20:17:41,616 ERROR trial_runner.py:773 -- Trial ConditionalPPOTorchTrainer_HNS_RLLib_Agent_v1_eafce_00000: Error processing event.
Traceback (most recent call last):
File "/home/docker/miniconda3/lib/python3.9/site-packages/ray/tune/trial_runner.py", line 739, in _process_trial
results = self.trial_executor.fetch_result(trial)
File "/home/docker/miniconda3/lib/python3.9/site-packages/ray/tune/ray_trial_executor.py", line 746, in fetch_result
result = ray.get(trial_future[0], timeout=DEFAULT_GET_TIMEOUT)
File "/home/docker/miniconda3/lib/python3.9/site-packages/ray/_private/client_mode_hook.py", line 82, in wrapper
return func(*args, **kwargs)
File "/home/docker/miniconda3/lib/python3.9/site-packages/ray/worker.py", line 1621, in get
raise value.as_instanceof_cause()
ray.exceptions.RayTaskError(AttributeError): ray::ConditionalPPOTorchTrainer.train()::Exiting (pid=353970, ip=172.17.0.2, repr=ConditionalPPOTorchTrainer)
File "/home/docker/miniconda3/lib/python3.9/site-packages/ray/rllib/agents/trainer.py", line 651, in train
raise e
File "/home/docker/miniconda3/lib/python3.9/site-packages/ray/rllib/agents/trainer.py", line 637, in train
result = Trainable.train(self)
File "/home/docker/miniconda3/lib/python3.9/site-packages/ray/tune/trainable.py", line 237, in train
result = self.step()
File "/home/docker/miniconda3/lib/python3.9/site-packages/ray/rllib/agents/trainer_template.py", line 193, in step
res = next(self.train_exec_impl)
AttributeError: 'ConditionalPPOTorchTrainer' object has no attribute 'train_exec_impl'
2022-02-25 20:17:41,627 WARNING worker.py:498 -- `ray.get_gpu_ids()` will always return the empty list when called from the driver. This is because Ray does not manage GPU allocations to the driver process.
== Status ==
Memory usage on this node: 11.5/15.4 GiB
Using FIFO scheduling algorithm.
Resources requested: 0/2 CPUs, 0/1 GPUs, 0.0/7.63 GiB heap, 0.0/3.81 GiB objects (0.0/1.0 accelerator_type:G)
Result logdir: /workspaces/projects/project/ray_results/QPR2/experiments
Number of trials: 1/1 (1 ERROR)
+-----------------------------------------------------------+----------+-------+--------------------+--------------------+
| Trial name | status | loc | Custom Metrics 1 | Custom Metrics 2 |
|-----------------------------------------------------------+----------+-------+--------------------+--------------------|
| ConditionalPPOTorchTrainer_HNS_RLLib_Agent_v1_eafce_00000 | ERROR | | | |
+-----------------------------------------------------------+----------+-------+--------------------+--------------------+
Number of errored trials: 1
+-----------------------------------------------------------+--------------+-----------------------------------------------------------------------------------------------------------------------------------------------------+
| Trial name | # failures | error file |
|-----------------------------------------------------------+--------------+-----------------------------------------------------------------------------------------------------------------------------------------------------|
| ConditionalPPOTorchTrainer_HNS_RLLib_Agent_v1_eafce_00000 | 1 | /workspaces/projects/project/ray_results/QPR2/experiments/ConditionalPPOTorchTrainer_HNS_RLLib_Agent_v1_eafce_00000_0_2022-02-25_20-17-13/error.txt |
+-----------------------------------------------------------+--------------+-----------------------------------------------------------------------------------------------------------------------------------------------------+