Customize DQN policy in two-trainer multiagent example

mgerstgrasser · September 20, 2022, 1:23am

How severe does this issue affect your experience of using Ray?

High: It blocks me to complete my task.

I am trying to do something similar to the two-trainer multiagent example. However, I need to customize the DQN policy and algorithm, for instance setting hiddens = [], dueling = False, double_q = False in the DQN config. All of these change the model architecture for the DQN model. However, I can’t set them in the PPO trainer, obviously, and as a result the DQN policy that’s generated inside the PPO model has a different architecture. And of course, then there’s no way to copy weights from one trainer to the other… Is there any way around this?

avnishn · September 20, 2022, 3:36pm

The short term work around is to make a function that modifies the config dict object with the parameters that you want to change to the dqn policy inside of the select_policy function.

github.com

ray-project/ray/blob/master/rllib/examples/multi_agent_two_trainers.py#L74-L76


      
          if framework == "torch":
              return DQNTorchPolicy
          else:

so for example it might look something like this:

def my_custom_dqn(self,
        observation_space,
        action_space,
        config,
):
    config["hiddens"] = ....  # override the hiddens
    return DQNTorchPolicy(observation_space,action_space,config)

and you could take that function and replace line 74 with it.

mgerstgrasser · September 20, 2022, 4:09pm

That works! Thank you.

I also had to set "simple_optimizer": False in my configs, otherwise it was failing because somewhere in deciding which optimizer to use it calls issubclass on what is now a function.

mannyv · September 20, 2022, 4:33pm

@mgerstgrasser

Prior to 2.0 this line here could be used to specify customizations to the config for each policy. Does it work to specify the hidden there?

github.com

ray-project/ray/blob/e234f8a04239db13a1741677ccc9c4f40dab392f/rllib/examples/multi_agent_two_trainers.py#L93


      
              "ppo_policy": (
                  seelct_policy("PPO", args.framework),
                  obs_space,
                  act_space,
                  {},
              ),
              "dqn_policy": (
                  seelct_policy("DQN", args.framework),
                  obs_space,
                  act_space,
                  {},
              ),
          }
          
          
def policy_mapping_fn(agent_id, episode, worker, **kwargs):
              if agent_id % 2 == 0:
                  return "ppo_policy"
              else:
                  return "dqn_policy"
          
          
ppo = PPO(

mgerstgrasser · September 20, 2022, 7:38pm

No, I tried that first, of course. It seems that due to the way DQN constructs policies, some of the configuration is taken from the algorithm config, not the policy config. Which partly makes sense, if you do dueling or double DQN, that changes both the algorithm as well as what the policy needs to look like.

Topic		Replies	Views
Multi-Agent Training with Different Algorithms RLlib	24	3488	October 11, 2022
Multi-agent Training with two Policies throwing model interfacing error RLlib	2	819	October 7, 2021
MADDPG against pre-trained DQN agents RLlib	1	441	January 9, 2023
Setting up multiagent config dict with different algorithm parameters RLlib	2	277	December 16, 2022
Can't understand training config Configure Algorithm, Training, Evaluation, Scaling	2	33	July 30, 2024

Customize DQN policy in two-trainer multiagent example

Related topics