How severe does this issue affect your experience of using Ray?
- Medium: It contributes to significant difficulty to complete my task, but I can work around it.
We would like to post-process trajectory in a multi-agent setting to share reward at the end of each episode.
We first look into
on_episode_end callback but it seems that we only have
episode: MultiAgentEpisode but not the sample batch. Is there a way to get the sample batch here to be freely to modify any reward at step
There is also the
on_postprocess_trajectory callback but we dont get the full episode in the sample batch. We are forced to use
complete_episode batch mode, which we dont want to.