How severe does this issue affect your experience of using Ray?
High: It blocks me to complete my task.
I am trying to do something similar to the two-trainer multiagent example. However, I need to customize the DQN policy and algorithm, for instance setting hiddens = , dueling = False, double_q = False in the DQN config. All of these change the model architecture for the DQN model. However, I can’t set them in the PPO trainer, obviously, and as a result the DQN policy that’s generated inside the PPO model has a different architecture. And of course, then there’s no way to copy weights from one trainer to the other… Is there any way around this?
No, I tried that first, of course. It seems that due to the way DQN constructs policies, some of the configuration is taken from the algorithm config, not the policy config. Which partly makes sense, if you do dueling or double DQN, that changes both the algorithm as well as what the policy needs to look like.