@Stefan-1313 , I have debugged a little more and can confirm that the timesteps inside the EpsilonGreedy
intances are correct and therefore also the epsilons. What is needed, is a method that can request the current epsilon values from the remote workers.
As described in this thread, some objects have a threading.RLock()
and can therefore not transferred via Ray
. One of these is the policy_map
of the RolloutWorker
that is used get_policy()
. So, at this moment I do not see a chance to record the cur_epsilon
values via a ray.get()
call.
As a solution you could create your own callback that reports the current epsilon after each sample step of a single worker:
from ray.rllib.algorithms.callbacks import DefaultCallbacks
from ray.rllib.policy.policy import Policy
from ray.rllib.policy.sample_batch import SampleBatch
from typing import TYPE_CHECKING
if TYPE_CHECKING:
from ray.rllib.evaluation import RolloutWorker
class MyCallback(DefaultCallbacks):
def __init__(self):
super().__init__()
def on_sample_end(
self, *, worker: "RolloutWorker", samples: SampleBatch, **kwargs
) -> None:
cur_epsilon = worker.get_policy().exploration.get_state()["cur_epsilon"]
print("cur_epsilon sample_end: {}".format(cur_epsilon))
def on_learn_on_batch(
self, *, policy: Policy, train_batch: SampleBatch, result: dict, **kwargs
) -> None:
# Just for demonstration. Here the local wprker does the learning.
# Therefore, the epsilon will remain at 1.0.
cur_epsilon = policy.exploration.get_state()["cur_epsilon"]
print("cur_epsilon: {}".format(cur_epsilon))
# .....
config_simple["callbacks"] = MyCallback
# ....
for n in range(n_iter):
result = agent.train()
This should print you for each sample()
call of the RolloutWorker
the currently used epsilon value for th rollout.