How severe does this issue affect your experience of using Ray?
- High: It blocks me to complete my task.
I am working in a multi-agent setting with two teams, and I would like to use pre-trained DQN agents for one team, and then train the second team using various different algorithms.
Basically what I do is:
- Create a DQN PolicySpec in the multiagent policy dict for the team 1 agents.
- Set
config["multiagent"]["policies_to_train"]
to only the team 2 agents / policies. - Use a
on_algorithm_init
callback to load weights from a checkpoint (for team 1 only) and callalgorithm.set_weights()
on them.
With most algorithms this works, but MADDPG throws errors, slightly different ones depending on whether I set config["simple_optimizer"]
to True
(error [1] below] or False
(error [2] below). For [1] it seems the problem is that MADDPG tries to find some extra keys in the sample batch for all of the policies, which aren’t there for the DQN policies. [2] is less clear, but I suspect it’s a similar problem.
Is there an easy fix for this? I’d be perfectly happy for the shared critic to only use team 2 actions, not team 1 as well.
[1] Error when using simple optimizer:
2022-10-21 11:15:54,280 ERROR trial_runner.py:980 -- Trial MADDPG_battle_v4_9b400_00000: Error processing event.
ray.exceptions.RayTaskError(KeyError): ray::MADDPG.train() (pid=72505, ip=127.0.0.1, repr=MADDPG)
File ".../lib/python3.10/site-packages/ray/tune/trainable/trainable.py", line 347, in train
result = self.step()
File ".../lib/python3.10/site-packages/ray/rllib/algorithms/algorithm.py", line 661, in step
results, train_iter_ctx = self._run_one_training_iteration()
File ".../lib/python3.10/site-packages/ray/rllib/algorithms/algorithm.py", line 2378, in _run_one_training_iteration
num_recreated += self.try_recover_from_step_attempt(
File ".../lib/python3.10/site-packages/ray/rllib/algorithms/algorithm.py", line 2190, in try_recover_from_step_attempt
raise error
File ".../lib/python3.10/site-packages/ray/rllib/algorithms/algorithm.py", line 2373, in _run_one_training_iteration
results = self.training_step()
File ".../lib/python3.10/site-packages/ray/rllib/algorithms/dqn/dqn.py", line 398, in training_step
train_results = train_one_step(self, train_batch)
File ".../lib/python3.10/site-packages/ray/rllib/execution/train_ops.py", line 82, in train_one_step
info = local_worker.learn_on_batch(train_batch)
File ".../lib/python3.10/site-packages/ray/rllib/evaluation/rollout_worker.py", line 912, in learn_on_batch
to_fetch[pid] = policy._build_learn_on_batch(builders[pid], batch)
File ".../lib/python3.10/site-packages/ray/rllib/policy/tf_policy.py", line 1106, in _build_learn_on_batch
self._get_loss_inputs_dict(postprocessed_batch, shuffle=False)
File ".../lib/python3.10/site-packages/ray/rllib/policy/tf_policy.py", line 1160, in _get_loss_inputs_dict
train_batch[key],
File ".../lib/python3.10/site-packages/ray/rllib/policy/sample_batch.py", line 705, in __getitem__
value = dict.__getitem__(self, key)
KeyError: 'obs_6'
[2] Error when not using simple optimizer:
2022-10-21 11:18:39,574 ERROR trial_runner.py:980 -- Trial MADDPG_battle_v4_710ea_00000: Error processing event.
ray.exceptions.RayTaskError(NotImplementedError): ray::MADDPG.train() (pid=72881, ip=127.0.0.1, repr=MADDPG)
File ".../lib/python3.10/site-packages/ray/tune/trainable/trainable.py", line 347, in train
result = self.step()
File ".../lib/python3.10/site-packages/ray/rllib/algorithms/algorithm.py", line 661, in step
results, train_iter_ctx = self._run_one_training_iteration()
File ".../lib/python3.10/site-packages/ray/rllib/algorithms/algorithm.py", line 2378, in _run_one_training_iteration
num_recreated += self.try_recover_from_step_attempt(
File ".../lib/python3.10/site-packages/ray/rllib/algorithms/algorithm.py", line 2190, in try_recover_from_step_attempt
raise error
File ".../lib/python3.10/site-packages/ray/rllib/algorithms/algorithm.py", line 2373, in _run_one_training_iteration
results = self.training_step()
File ".../lib/python3.10/site-packages/ray/rllib/algorithms/dqn/dqn.py", line 400, in training_step
train_results = multi_gpu_train_one_step(self, train_batch)
File ".../lib/python3.10/site-packages/ray/rllib/execution/train_ops.py", line 152, in multi_gpu_train_one_step
num_loaded_samples[policy_id] = local_worker.policy_map[
File ".../lib/python3.10/site-packages/ray/rllib/policy/policy.py", line 611, in load_batch_into_buffer
raise NotImplementedError
NotImplementedError