How severe does this issue affect your experience of using Ray?
- Medium: It contributes to significant difficulty to complete my task, but I can work around it.
Hello,
We would like to post-process trajectory in a multi-agent setting to share reward at the end of each episode.
We first look into on_episode_end
callback but it seems that we only have episode: MultiAgentEpisode
but not the sample batch. Is there a way to get the sample batch here to be freely to modify any reward at step t
?
There is also the on_postprocess_trajectory
callback but we dont get the full episode in the sample batch. We are forced to use complete_episode
batch mode, which we dont want to.
Thanks