Target_network_update_freq APEX vs DQN

How severe does this issue affect your experience of using Ray?

  • Low: It annoys or frustrates me for a moment.

I want to update my target network every 1000 steps. One episode lasts for 1000 steps. So I expect 1 update per episode.

On DQN it is straightforward, I set target_network_update_freq=1000 and that’s it.

On APEX-DQN, it depends on a lot of parameters, namely train_batch_size, training_intensity, num_workers and ofc target_network_update_freq.

How can I define properly those parameters to get 1 update per episode fed to the learner on APEX? The documentation does not say anything about it.

See the tensorboard metric bellow where I’ve tried different combination of parameters for APEX, vs DQN (in teal)

apex_vs_dqn

I have upgraded to the last commit on master and ray wheel 3.0 and it works as excepted