Using exploration during evaluation

Lars_Simon_Zehnder · December 31, 2021, 11:27am

Hi there,

I am currently trying to fully replicate the original DQN setting used in the Nature paper of Minh et al. (2015). Therein they state that they use exploration during evaluation (makes sense to me) with an epsilon of 0.05.

I used herefore the simple_q agent from RLlib and therein exploration during evaluation can be controlled by using the config:

"evaluation_config": {
          "explore": False,
}

I could simply set "explore": True, but I want to set also the epsilon by which exploration intensity is controlled. DOes anyone have an example or a clue of how to do this?

Furthermore, I searched the source code for the config["evaluation_config"]["explore"] attribute to be used and could not find any usage - does anyone know where this happens and can point me to the source code of it?

Thanks in advance

Roller44 · January 1, 2022, 2:37am

Hi, you can configure the exploration_config to set exploration parameters. Here are the default settings of it.

'exploration_config': {'epsilon_timesteps': 10000,
                        'final_epsilon': 0.02,
                        'initial_epsilon': 1.0,
                        'type': 'EpsilonGreedy'},

For more available settings, please run the following Python code.

import ray
from ray.rllib.agents.ddpg import DEFAULT_CONFIG
import pprint
pprint.pprint(DEFAULT_CONFIG)

mannyv · January 1, 2022, 6:12pm

Hi @Lars_Simon_Zehnder,

The evaluation_config is a configuration for overriding the base config during evaluations. You would want to put changes in there. Given your goal you would want it to look like this

"evaluation_config": {
     'exploration_config': {'epsilon_timesteps': 10000,
                        'final_epsilon': 0.05,
                        'initial_epsilon': 0.05,
                        'type': 'EpsilonGreedy'},
}

mannyv · January 1, 2022, 6:49pm

@Lars_Simon_Zehnder,

As for where it is used, check here:

github.com

ray-project/ray/blob/340fbf53c0e885c8406d20679e6917d6f6120f0d/rllib/agents/trainer.py#L860

    
      
          # Update with evaluation settings:
          user_eval_config = copy.deepcopy(self.config["evaluation_config"])
          
          
# Assert that user has not unset "in_evaluation".
          assert "in_evaluation" not in user_eval_config or \
                 user_eval_config["in_evaluation"] is True
          
          
# Merge user-provided eval config with the base config. This makes sure
          # the eval config is always complete, no matter whether we have eval
          # workers or perform evaluation on the (non-eval) local worker.
          eval_config = merge_dicts(self.config, user_eval_config)
          self.config["evaluation_config"] = eval_config
          
          
if self.config.get("evaluation_num_workers", 0) > 0 or \
                  self.config.get("evaluation_interval"):
              logger.debug(f"Using evaluation_config: {user_eval_config}.")
          
          
    # Validate evaluation config.
              self.validate_config(eval_config)
          
          
    # Set the `in_evaluation` flag.

Happy New Year!

Lars_Simon_Zehnder · January 5, 2022, 6:10pm

Hi @mannyv,

thanks for your answer! This makes it clear now! I think this cannot be found in the documentation.

Best,
Simon

Topic		Replies	Views
Evaluation run seems to not change at all, in any of my runs? RLlib	4	290	September 19, 2022
RLlib DQN Trainer Evaluate Function Help RLlib	1	321	August 22, 2022
Recommended way to evaluate training results RLlib	0	3273	June 12, 2021
Custom_eval_function lost access to evaluation_config in Ray 2.2.0 RLlib	0	213	January 10, 2023
Using built-in evaluation in rllib RLlib	1	265	January 28, 2022

Using exploration during evaluation

Related topics