How to log current Epsilon value on Ray Tune?

1. Severity of the issue: (select one)
None: I’m just curious or want clarification.
Low: Annoying but doesn’t hinder my work.
Medium: Significantly affects my productivity but can find a workaround.
High: Completely blocks me.

2. Environment:

  • Ray version:
    – 2.47.1
  • Python version:
    – 3.12.7
  • OS:
    – MacOs Apple M3 os: 15.5

3. What happened vs. what you expected:

  • Expected:
    – Log current epsilon value to Tensorboard via MetricLogger
  • Actual:
    – I got error when run the code
site-packages/ray/rllib/algorithms/algorithm.py", line 2355, in get_policy
    return self.env_runner.get_policy(policy_id)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^
AttributeError: 'SingleAgentEnvRunner' object has no attribute 'get_policy'

I found that algorithm.get_poliy() have @OldAPIStack annotation, but I don’t know how to solved this on new APIStack.

This is my Callbacks to log to Tensorboard

class ActionStatsCallback(DefaultCallbacks):
    def on_train_result(self, *, algorithm, metrics_logger, result, **kwargs) -> None:
        """Log the current exploration epsilon"""

        policy = algorithm.get_policy()
        epsilon = policy.get_exploration_state().get("cur_epsilon")
        metrics_logger.log_value("cur_epsilon", float(epsilon))

This is my DQNConfig

algo_config = (
        DQNConfig()
        .environment("EnvTrain", env_config={})
        .framework("torch")
        .env_runners(num_cpus_per_env_runner=1,
                     num_env_runners=1,
                     num_envs_per_env_runner=2,
                     rollout_fragment_length='auto',
                     sample_timeout_s=120,
                     batch_mode="complete_episodes",
                     explore= True,
                    )
        .resources(num_gpus=0,num_cpus_for_main_process=4)
        .callbacks(ActionStatsCallback)
        .evaluation(
            evaluation_interval=2,
            evaluation_duration=1,
            evaluation_config={"env": "EnvVal", "env_config": {}},
        )
        .training(
            model={
                    "fcnet_hiddens": [128, 64,], 
                    "fcnet_activation": "relu",
                    "use_attention":True,
                    "use_lstm":True,
                    "lstm_use_prev_actions": tune.grid_search([32,16]),
                    "attention_use_n_prev_actions":8,
                    "attention_use_n_prev_rewards":8,
                },
            num_epochs=30,
            n_step=4,
            gamma=0.99,
            num_atoms=51,
            train_batch_size=64,
            noisy=True,
            dueling=True,
            double_q=True,
    )

For me, this rather looks like an RLlib topic than a Tune topic. @arturn : What do you think?