DQN Rollout Config to fit Nature DQN

Hi, how would I configure the rollout settings to fit the DQN in the nature paper? I am confused at the Rollout Config settings. Specifically, I want to exactly replicate what was done in the Nature DQN paper. So I have 0 num_rollout_workers (rollouts are done in the local worker). I would like for each iteration in the DQN to step the environment once and store this one timestep into the buffer. After which with a train_batch_size of 32, it should sample from the replay buffer 32 random samples and proceed on training the policy network.

In the the code below, I have set the batch_size to 32, training_intensity to be None, rollout settings to be rollout_fragment_length=1 and batch_mode=‘truncate_episodes’. I wonder if this is correct for what I am trying to achieve. I.e., at timestep_t, store one transition to buffer (state_t, action_t, reward_t, state_t+1) and then randomly sample 32 transitions from the buffer to train on. At timestep_t+1, it should store into the buffer ( (state_t+1, action_t+1, reward_t+1, state_t+2) and so on …

    param_space = DQNConfig()

    param_space = param_space.training(
        gamma=0.99, 
        lr=1e-4, 
        train_batch_size=32, 
        model={
            '_disable_preprocessor_api': True,
            'conv_filters': [[32,8,4],[64,4,2],[64,3,1]],
            'conv_activation': 'relu',
            'post_fcnet_hiddens': [512],
            'post_fcnet_activation': 'relu',
            'no_final_linear': False,
            'vf_share_layers': False,
        },
        optimizer={'adam_epsilon': 1e-8},
        grad_clip=None,
        num_atoms=1,
        noisy=False,
        dueling=False,
        double_q=False,
        n_step=1,
        replay_buffer_config={'type': 'ReplayBuffer', 'capacity': 100000},
        td_error_loss_fn='huber',
        training_intensity=None,
    )

    param_space = param_space.rollouts(num_rollout_workers=0, num_envs_per_worker=1, rollout_fragment_length=1, batch_mode='truncate_episodes')

1 Like

I think you also need to specify how many updates to do when calling .train(). I am also trying to figure this out: I basically just want train() to be a single batched update, but it doesn’t seem like this is the case and I can’t see how to change the config to make it so.

1 Like