Hi, how would I configure the rollout settings to fit the DQN in the nature paper? I am confused at the Rollout Config settings. Specifically, I want to exactly replicate what was done in the Nature DQN paper. So I have 0 num_rollout_workers (rollouts are done in the local worker). I would like for each iteration in the DQN to step the environment once and store this one timestep into the buffer. After which with a train_batch_size of 32, it should sample from the replay buffer 32 random samples and proceed on training the policy network.
In the the code below, I have set the batch_size to 32, training_intensity to be None, rollout settings to be rollout_fragment_length=1 and batch_mode=‘truncate_episodes’. I wonder if this is correct for what I am trying to achieve. I.e., at timestep_t, store one transition to buffer (state_t, action_t, reward_t, state_t+1) and then randomly sample 32 transitions from the buffer to train on. At timestep_t+1, it should store into the buffer ( (state_t+1, action_t+1, reward_t+1, state_t+2) and so on …
param_space = DQNConfig()
param_space = param_space.training(
gamma=0.99,
lr=1e-4,
train_batch_size=32,
model={
'_disable_preprocessor_api': True,
'conv_filters': [[32,8,4],[64,4,2],[64,3,1]],
'conv_activation': 'relu',
'post_fcnet_hiddens': [512],
'post_fcnet_activation': 'relu',
'no_final_linear': False,
'vf_share_layers': False,
},
optimizer={'adam_epsilon': 1e-8},
grad_clip=None,
num_atoms=1,
noisy=False,
dueling=False,
double_q=False,
n_step=1,
replay_buffer_config={'type': 'ReplayBuffer', 'capacity': 100000},
td_error_loss_fn='huber',
training_intensity=None,
)
param_space = param_space.rollouts(num_rollout_workers=0, num_envs_per_worker=1, rollout_fragment_length=1, batch_mode='truncate_episodes')