Setting config["dueling"]=False still runs Dueling DQN

When I set config["dueling"]=False, I expect to run the raw DQN in which the network returns the Q-values of each action. But, the network returns one node for the value, and the advantage value of each action. Basically, it seems that it still runs the Dueling DQN algorithms. Here is the name of the layers and their size which I got by calling agent.get_weights():

'_convs.0._model.1.weight' = {ndarray: (16, 4, 8, 8)} 
'_convs.1._model.1.weight' = {ndarray: (32, 16, 4, 4)} 
'_convs.1._model.1.bias' = {ndarray: (32,)} 
'_convs.2._model.0.weight' = {ndarray: (256, 32, 11, 11)} 
'_convs.2._model.0.bias' = {ndarray: (256,)} 
'_value_branch._model.0.weight' = {ndarray: (1, 256)} 
'_value_branch._model.0.bias' = {ndarray: (1,)} 
'advantage_module.dueling_A_0._model.0.weight' = {ndarray: (256, 256)} 
'advantage_module.dueling_A_0._model.0.bias' = {ndarray: (256,)} 
'advantage_module.A._model.0.weight' = {ndarray: (4, 256)} 
'advantage_module.A._model.0.bias' = {ndarray: (4,)} 
    from ray.rllib.agents.dqn import DQNTrainer, DEFAULT_CONFIG
    config = DEFAULT_CONFIG.copy()
    config['num_cpus_per_worker'] = 8
    config["double_q"] = False
    config["framework"] = "torch"
    config["dueling"] = False
    config["prioritized_replay"] = False
    agent = DQNTrainer(config, 'BreakoutNoFrameskip-v0')

Ray version, Python version, OS: ray 1.5.1, torch 1.7.0, Cent OS 7.

For reference (ray/dqn_torch_model.py at d553d4da6cdacb1516a33d5744904d993e524c43 · ray-project/ray · GitHub), the weights are initialized without regards to dueling's value. However, these weights are not used when dueling=False.

2 Likes

Thanks for the quick answer!