When I set config["dueling"]=False
, I expect to run the raw DQN in which the network returns the Q-values of each action. But, the network returns one node for the value, and the advantage value of each action. Basically, it seems that it still runs the Dueling DQN algorithms. Here is the name of the layers and their size which I got by calling agent.get_weights()
:
'_convs.0._model.1.weight' = {ndarray: (16, 4, 8, 8)}
'_convs.1._model.1.weight' = {ndarray: (32, 16, 4, 4)}
'_convs.1._model.1.bias' = {ndarray: (32,)}
'_convs.2._model.0.weight' = {ndarray: (256, 32, 11, 11)}
'_convs.2._model.0.bias' = {ndarray: (256,)}
'_value_branch._model.0.weight' = {ndarray: (1, 256)}
'_value_branch._model.0.bias' = {ndarray: (1,)}
'advantage_module.dueling_A_0._model.0.weight' = {ndarray: (256, 256)}
'advantage_module.dueling_A_0._model.0.bias' = {ndarray: (256,)}
'advantage_module.A._model.0.weight' = {ndarray: (4, 256)}
'advantage_module.A._model.0.bias' = {ndarray: (4,)}
from ray.rllib.agents.dqn import DQNTrainer, DEFAULT_CONFIG
config = DEFAULT_CONFIG.copy()
config['num_cpus_per_worker'] = 8
config["double_q"] = False
config["framework"] = "torch"
config["dueling"] = False
config["prioritized_replay"] = False
agent = DQNTrainer(config, 'BreakoutNoFrameskip-v0')
Ray version, Python version, OS: ray 1.5.1, torch 1.7.0, Cent OS 7.