Post process trajectory with full episode

How severe does this issue affect your experience of using Ray?

  • Medium: It contributes to significant difficulty to complete my task, but I can work around it.

Hello,

We would like to post-process trajectory in a multi-agent setting to share reward at the end of each episode.

We first look into on_episode_end callback but it seems that we only have episode: MultiAgentEpisode but not the sample batch. Is there a way to get the sample batch here to be freely to modify any reward at step t?

There is also the on_postprocess_trajectory callback but we dont get the full episode in the sample batch. We are forced to use complete_episode batch mode, which we dont want to.

Thanks

Hi,

Did you manage to solve the problem? I am trying to use the on_postprocess_trajectory callback to modify both rewards and observations for some agents but it seems that this callback is called after the rewards and observations are used to learn.

thanks.