Customize DQN policy in two-trainer multiagent example

How severe does this issue affect your experience of using Ray?

  • High: It blocks me to complete my task.

I am trying to do something similar to the two-trainer multiagent example. However, I need to customize the DQN policy and algorithm, for instance setting hiddens = [], dueling = False, double_q = False in the DQN config. All of these change the model architecture for the DQN model. However, I can’t set them in the PPO trainer, obviously, and as a result the DQN policy that’s generated inside the PPO model has a different architecture. And of course, then there’s no way to copy weights from one trainer to the other… Is there any way around this?

The short term work around is to make a function that modifies the config dict object with the parameters that you want to change to the dqn policy inside of the select_policy function.

so for example it might look something like this:

def my_custom_dqn(self,
        observation_space,
        action_space,
        config,
):
    config["hiddens"] = ....  # override the hiddens
    return DQNTorchPolicy(observation_space,action_space,config)

and you could take that function and replace line 74 with it.

1 Like

That works! Thank you.

I also had to set "simple_optimizer": False in my configs, otherwise it was failing because somewhere in deciding which optimizer to use it calls issubclass on what is now a function.

@mgerstgrasser

Prior to 2.0 this line here could be used to specify customizations to the config for each policy. Does it work to specify the hidden there?

No, I tried that first, of course. It seems that due to the way DQN constructs policies, some of the configuration is taken from the algorithm config, not the policy config. Which partly makes sense, if you do dueling or double DQN, that changes both the algorithm as well as what the policy needs to look like.