How severe does this issue affect your experience of using Ray?
High: It blocks me to complete my task.
I am trying to do something similar to the two-trainer multiagent example. However, I need to customize the DQN policy and algorithm, for instance setting hiddens = [], dueling = False, double_q = False in the DQN config. All of these change the model architecture for the DQN model. However, I can’t set them in the PPO trainer, obviously, and as a result the DQN policy that’s generated inside the PPO model has a different architecture. And of course, then there’s no way to copy weights from one trainer to the other… Is there any way around this?
The short term work around is to make a function that modifies the config dict object with the parameters that you want to change to the dqn policy inside of the select_policy function.
I also had to set "simple_optimizer": False in my configs, otherwise it was failing because somewhere in deciding which optimizer to use it calls issubclass on what is now a function.
No, I tried that first, of course. It seems that due to the way DQN constructs policies, some of the configuration is taken from the algorithm config, not the policy config. Which partly makes sense, if you do dueling or double DQN, that changes both the algorithm as well as what the policy needs to look like.