How to recompute the advantage in learning (ppo)

Shanchao_Yang · May 22, 2021, 6:40am

According to this paper, recomputing the advantage can be helpful for the performance.

Hi, Sven, could you show some hints about how to do that in ppo_torch_policy.py? @sven1977
The function compute_advantages is relevant, but I am not sure where to add it.

Thanks!

sven1977 · May 24, 2021, 10:10am

Hey @Shanchao_Yang , thanks for your question. Allow me to kindly ask you to always direct your questions to the entire community here as many users here may know much better how to help you with your problem
Advantage calculations usually happen in the “postprocess_trajectory” step. You can define a custom callback (example: ray/rllib/examples/custom_metrics_and_callbacks.py override the on_postprocess_trajectory method) to alter/adjust the batch that is about to be sent into your loss function and change the advantages therein.
Alternatively, you can build a new Policy class via e.g.:

MyNewPolicyCls = PPOTorchPolicy.with_updates(postprocess_trajectory_fn=[your own postprocessing function])

This will give you a new PPO-style policy, but with your postprocessing function instead of the built-in RLlib one.

Shanchao_Yang · May 24, 2021, 9:37pm

Thanks for your suggestion.

canbooo · October 5, 2021, 9:21am

Searched the github issue board and docs before finding this. I would suggest increasing the visibility of this board in the docs as docs fill-up the first pages of search results. Thanks for the solution!

Topic		Replies	Views
How to calculate the advantage in the forward method of a custom model based on the PPO algorithm RLlib	1	33	March 24, 2025
How to contribute a proposal for an adapted advantage computation to RLlib RLlib	3	447	December 3, 2021
Seeking recommendations for implementing Dual Curriculum Design in RLlib RLlib	13	668	April 11, 2023
How to do the reward normalization in RLlib's PPO RLlib	2	3100	December 14, 2021
[rllib] Retrieve and modify the computed discrete action logits to PPO agent RLlib	6	702	May 5, 2021

How to recompute the advantage in learning (ppo)

Related topics