I implemented a proposal for an adapted advantage compution (which could be used by PPO). My implemented example shows how this adapted advantage compution can be done via the custom callback function
on_postprocess_trajectory. Maybe it is also helpful and interesting for other people, therefore how can I contribute my proposal to RLlib? Open an issue on Github?