Hey guys,
I implemented a proposal for an adapted advantage compution (which could be used by PPO). My implemented example shows how this adapted advantage compution can be done via the custom callback function on_postprocess_trajectory
. Maybe it is also helpful and interesting for other people, therefore how can I contribute my proposal to RLlib? Open an issue on Github?