MADDPG against pre-trained DQN agents

How severe does this issue affect your experience of using Ray?

  • High: It blocks me to complete my task.

I am working in a multi-agent setting with two teams, and I would like to use pre-trained DQN agents for one team, and then train the second team using various different algorithms.

Basically what I do is:

  • Create a DQN PolicySpec in the multiagent policy dict for the team 1 agents.
  • Set config["multiagent"]["policies_to_train"] to only the team 2 agents / policies.
  • Use a on_algorithm_init callback to load weights from a checkpoint (for team 1 only) and call algorithm.set_weights() on them.

With most algorithms this works, but MADDPG throws errors, slightly different ones depending on whether I set config["simple_optimizer"] to True (error [1] below] or False (error [2] below). For [1] it seems the problem is that MADDPG tries to find some extra keys in the sample batch for all of the policies, which aren’t there for the DQN policies. [2] is less clear, but I suspect it’s a similar problem.

Is there an easy fix for this? I’d be perfectly happy for the shared critic to only use team 2 actions, not team 1 as well.

[1] Error when using simple optimizer:


2022-10-21 11:15:54,280 ERROR trial_runner.py:980 -- Trial MADDPG_battle_v4_9b400_00000: Error processing event.
ray.exceptions.RayTaskError(KeyError): ray::MADDPG.train() (pid=72505, ip=127.0.0.1, repr=MADDPG)
  File ".../lib/python3.10/site-packages/ray/tune/trainable/trainable.py", line 347, in train
    result = self.step()
  File ".../lib/python3.10/site-packages/ray/rllib/algorithms/algorithm.py", line 661, in step
    results, train_iter_ctx = self._run_one_training_iteration()
  File ".../lib/python3.10/site-packages/ray/rllib/algorithms/algorithm.py", line 2378, in _run_one_training_iteration
    num_recreated += self.try_recover_from_step_attempt(
  File ".../lib/python3.10/site-packages/ray/rllib/algorithms/algorithm.py", line 2190, in try_recover_from_step_attempt
    raise error
  File ".../lib/python3.10/site-packages/ray/rllib/algorithms/algorithm.py", line 2373, in _run_one_training_iteration
    results = self.training_step()
  File ".../lib/python3.10/site-packages/ray/rllib/algorithms/dqn/dqn.py", line 398, in training_step
    train_results = train_one_step(self, train_batch)
  File ".../lib/python3.10/site-packages/ray/rllib/execution/train_ops.py", line 82, in train_one_step
    info = local_worker.learn_on_batch(train_batch)
  File ".../lib/python3.10/site-packages/ray/rllib/evaluation/rollout_worker.py", line 912, in learn_on_batch
    to_fetch[pid] = policy._build_learn_on_batch(builders[pid], batch)
  File ".../lib/python3.10/site-packages/ray/rllib/policy/tf_policy.py", line 1106, in _build_learn_on_batch
    self._get_loss_inputs_dict(postprocessed_batch, shuffle=False)
  File ".../lib/python3.10/site-packages/ray/rllib/policy/tf_policy.py", line 1160, in _get_loss_inputs_dict
    train_batch[key],
  File ".../lib/python3.10/site-packages/ray/rllib/policy/sample_batch.py", line 705, in __getitem__
    value = dict.__getitem__(self, key)
KeyError: 'obs_6'

[2] Error when not using simple optimizer:

2022-10-21 11:18:39,574 ERROR trial_runner.py:980 -- Trial MADDPG_battle_v4_710ea_00000: Error processing event.
ray.exceptions.RayTaskError(NotImplementedError): ray::MADDPG.train() (pid=72881, ip=127.0.0.1, repr=MADDPG)
  File ".../lib/python3.10/site-packages/ray/tune/trainable/trainable.py", line 347, in train
    result = self.step()
  File ".../lib/python3.10/site-packages/ray/rllib/algorithms/algorithm.py", line 661, in step
    results, train_iter_ctx = self._run_one_training_iteration()
  File ".../lib/python3.10/site-packages/ray/rllib/algorithms/algorithm.py", line 2378, in _run_one_training_iteration
    num_recreated += self.try_recover_from_step_attempt(
  File ".../lib/python3.10/site-packages/ray/rllib/algorithms/algorithm.py", line 2190, in try_recover_from_step_attempt
    raise error
  File ".../lib/python3.10/site-packages/ray/rllib/algorithms/algorithm.py", line 2373, in _run_one_training_iteration
    results = self.training_step()
  File ".../lib/python3.10/site-packages/ray/rllib/algorithms/dqn/dqn.py", line 400, in training_step
    train_results = multi_gpu_train_one_step(self, train_batch)
  File ".../lib/python3.10/site-packages/ray/rllib/execution/train_ops.py", line 152, in multi_gpu_train_one_step
    num_loaded_samples[policy_id] = local_worker.policy_map[
  File ".../lib/python3.10/site-packages/ray/rllib/policy/policy.py", line 611, in load_batch_into_buffer
    raise NotImplementedError
NotImplementedError

Hi @mgerstgrasser, Can you share a repro script of your workload so we can help you better?