How to get the current epsilon value after a training iteration?

Lars_Simon_Zehnder · July 27, 2022, 1:13pm

@Stefan-1313 , I have debugged a little more and can confirm that the timesteps inside the EpsilonGreedy intances are correct and therefore also the epsilons. What is needed, is a method that can request the current epsilon values from the remote workers.

As described in this thread, some objects have a threading.RLock() and can therefore not transferred via Ray. One of these is the policy_map of the RolloutWorker that is used get_policy(). So, at this moment I do not see a chance to record the cur_epsilon values via a ray.get() call.

As a solution you could create your own callback that reports the current epsilon after each sample step of a single worker:

from ray.rllib.algorithms.callbacks import DefaultCallbacks
from ray.rllib.policy.policy import Policy
from ray.rllib.policy.sample_batch import SampleBatch

from typing import TYPE_CHECKING

if TYPE_CHECKING:
    from ray.rllib.evaluation import RolloutWorker

class MyCallback(DefaultCallbacks):
    
    def __init__(self):
        super().__init__()
        
    def on_sample_end(
        self, *, worker: "RolloutWorker", samples: SampleBatch, **kwargs
    ) -> None:
        cur_epsilon = worker.get_policy().exploration.get_state()["cur_epsilon"]
        print("cur_epsilon sample_end: {}".format(cur_epsilon))
        
    def on_learn_on_batch(
        self, *, policy: Policy, train_batch: SampleBatch, result: dict, **kwargs
    ) -> None:
        # Just for demonstration. Here the local wprker does the learning. 
        # Therefore, the epsilon will remain at 1.0. 
        cur_epsilon = policy.exploration.get_state()["cur_epsilon"]
        print("cur_epsilon: {}".format(cur_epsilon))

# ..... 
config_simple["callbacks"] = MyCallback

# ....

for n in range(n_iter): 
      result = agent.train()

This should print you for each sample() call of the RolloutWorker the currently used epsilon value for th rollout.

Topic		Replies	Views
Dqn algo epsilon not logged RLlib	3	349	December 1, 2022
Using exploration during evaluation RLlib	4	902	January 5, 2022
Read Tune console output from Simple Q RLlib	8	1518	October 26, 2021
PPO.train incorrect result RLlib	1	253	May 23, 2023
Unable to get 'episode_reward_mean' RLlib	3	85	January 3, 2025

How to get the current epsilon value after a training iteration?

Related topics