How to log current Epsilon value on Ray Tune?

1. Severity of the issue: (select one)
None: I’m just curious or want clarification.
Low: Annoying but doesn’t hinder my work.
Medium: Significantly affects my productivity but can find a workaround.
High: Completely blocks me.

2. Environment:

  • Ray version:
    – 2.47.1
  • Python version:
    – 3.12.7
  • OS:
    – MacOs Apple M3 os: 15.5

3. What happened vs. what you expected:

  • Expected:
    – Log current epsilon value to Tensorboard via MetricLogger
  • Actual:
    – I got error when run the code
site-packages/ray/rllib/algorithms/algorithm.py", line 2355, in get_policy
    return self.env_runner.get_policy(policy_id)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^
AttributeError: 'SingleAgentEnvRunner' object has no attribute 'get_policy'

I found that algorithm.get_poliy() have @OldAPIStack annotation, but I don’t know how to solved this on new APIStack.

This is my Callbacks to log to Tensorboard

class ActionStatsCallback(DefaultCallbacks):
    def on_train_result(self, *, algorithm, metrics_logger, result, **kwargs) -> None:
        """Log the current exploration epsilon"""

        policy = algorithm.get_policy()
        epsilon = policy.get_exploration_state().get("cur_epsilon")
        metrics_logger.log_value("cur_epsilon", float(epsilon))

This is my DQNConfig

algo_config = (
        DQNConfig()
        .environment("EnvTrain", env_config={})
        .framework("torch")
        .env_runners(num_cpus_per_env_runner=1,
                     num_env_runners=1,
                     num_envs_per_env_runner=2,
                     rollout_fragment_length='auto',
                     sample_timeout_s=120,
                     batch_mode="complete_episodes",
                     explore= True,
                    )
        .resources(num_gpus=0,num_cpus_for_main_process=4)
        .callbacks(ActionStatsCallback)
        .evaluation(
            evaluation_interval=2,
            evaluation_duration=1,
            evaluation_config={"env": "EnvVal", "env_config": {}},
        )
        .training(
            model={
                    "fcnet_hiddens": [128, 64,], 
                    "fcnet_activation": "relu",
                    "use_attention":True,
                    "use_lstm":True,
                    "lstm_use_prev_actions": tune.grid_search([32,16]),
                    "attention_use_n_prev_actions":8,
                    "attention_use_n_prev_rewards":8,
                },
            num_epochs=30,
            n_step=4,
            gamma=0.99,
            num_atoms=51,
            train_batch_size=64,
            noisy=True,
            dueling=True,
            double_q=True,
    )

For me, this rather looks like an RLlib topic than a Tune topic. @arturn : What do you think?

Yes, thanks for bringing this up.
Logging to tensorboard is connected to tune. But the error you are running into is RLlib related.

You hit an issue there. You local env runner needs to be a RolloutWorker (old API stack).
You need to disable enable_env_runner_and_connector_v2 to use RolloutWorkers.

If you are using the new API stack, you’ll be able to fetch the epsilon from the DQN RL Module. Grab the module and run → dqn_rl_module.epsilon_schedule.get_current_value()

1 Like