I have already opened an issue to the github page:
I believe there is a bug while restoring Apex-DQN agent by restoring only the actual network, not the target one which causes high td_error and resultantly low rewards after restore.
I read all the discussions about checkpointing and resuming the training but cannot come up with a solution.
One short-term workaround would be storing the weights by myself and force target network with ._set_weights API…
Any ideas or recommendations?