I am currently trying to fully replicate the original DQN setting used in the Nature paper of Minh et al. (2015). Therein they state that they use exploration during evaluation (makes sense to me) with an epsilon of 0.05.
I used herefore the simple_q agent from RLlib and therein exploration during evaluation can be controlled by using the config:
"evaluation_config": {
"explore": False,
}
I could simply set "explore": True, but I want to set also the epsilon by which exploration intensity is controlled. DOes anyone have an example or a clue of how to do this?
Furthermore, I searched the source code for the config["evaluation_config"]["explore"] attribute to be used and could not find any usage - does anyone know where this happens and can point me to the source code of it?
The evaluation_config is a configuration for overriding the base config during evaluations. You would want to put changes in there. Given your goal you would want it to look like this