Using exploration during evaluation

Hi there,

I am currently trying to fully replicate the original DQN setting used in the Nature paper of Minh et al. (2015). Therein they state that they use exploration during evaluation (makes sense to me) with an epsilon of 0.05.

I used herefore the simple_q agent from RLlib and therein exploration during evaluation can be controlled by using the config:

"evaluation_config": {
          "explore": False,

I could simply set "explore": True, but I want to set also the epsilon by which exploration intensity is controlled. DOes anyone have an example or a clue of how to do this?

Furthermore, I searched the source code for the config["evaluation_config"]["explore"] attribute to be used and could not find any usage - does anyone know where this happens and can point me to the source code of it?

Thanks in advance

1 Like

Hi, you can configure the exploration_config to set exploration parameters. Here are the default settings of it.

'exploration_config': {'epsilon_timesteps': 10000,
                        'final_epsilon': 0.02,
                        'initial_epsilon': 1.0,
                        'type': 'EpsilonGreedy'},

For more available settings, please run the following Python code.

import ray
from ray.rllib.agents.ddpg import DEFAULT_CONFIG
import pprint
1 Like

Hi @Lars_Simon_Zehnder,

The evaluation_config is a configuration for overriding the base config during evaluations. You would want to put changes in there. Given your goal you would want it to look like this

"evaluation_config": {
     'exploration_config': {'epsilon_timesteps': 10000,
                        'final_epsilon': 0.05,
                        'initial_epsilon': 0.05,
                        'type': 'EpsilonGreedy'},


As for where it is used, check here:

Happy New Year!

Hi @mannyv,

thanks for your answer! This makes it clear now! I think this cannot be found in the documentation.