Logging logp_ratio in PPO using custom callbacks

Jakub_Staniszewski · October 21, 2021, 4:36pm

Hi,
I am trying to log logp_ratio that is implemented in /agents/ppo/ppo_tf_policy.py:

    logp_ratio = tf.exp(
        curr_action_dist.logp(train_batch[SampleBatch.ACTIONS]) -
        train_batch[SampleBatch.ACTION_LOGP])

I am trying to do this in the following manner:

class MyCallbacks(DefaultCallbacks):
    def on_learn_on_batch(self, *, policy, train_batch, result: dict, **kwargs):
        if "prev_logp" not in result:
            result["logp_ration"] = 1
        else:
            result["logp_ration"] = np.exp(result["prev_logp"] - train_batch["action_logp"])
        result["prev_logp"] = train_batch["action_logp"]

But obviously every time i am in on_learn_on_batch, results dict is empty.
In what way can I to store such values?
My high level idea is to debug more deeply what’s going on in PPO surrogate function by logging it on wandb (W&B) platform.

Thanks,
Jakub

Topic		Replies	Views
Obtain in eager mode Actor model output in ray.tune mode using DefaultCallbacks class RLlib	0	224	October 13, 2021
~~Possible PPO surrogate policy loss sign error~~ RLlib	2	761	October 4, 2022
Output from custom policy network for PPO RLlib	1	414	November 15, 2022
Logging custom metrics by trial during PBT training RLlib	1	229	July 1, 2021
Rewards by curiosity module RLlib	1	225	May 14, 2022

Logging logp_ratio in PPO using custom callbacks

Related topics