Post process trajectory with full episode

How severe does this issue affect your experience of using Ray?

  • Medium: It contributes to significant difficulty to complete my task, but I can work around it.


We would like to post-process trajectory in a multi-agent setting to share reward at the end of each episode.

We first look into on_episode_end callback but it seems that we only have episode: MultiAgentEpisode but not the sample batch. Is there a way to get the sample batch here to be freely to modify any reward at step t?

There is also the on_postprocess_trajectory callback but we dont get the full episode in the sample batch. We are forced to use complete_episode batch mode, which we dont want to.