Hi,
I am using Ray RLlib (2.3) with Gymnasium, i.e., distinguishing terminated and truncated, for multi-agent RL (currently with APEX-DQN).
I am considering the following edge case:
- The episode time limit is hit while some agents did not yet terminate, so the episode is truncated and
truncated["__all__"]is set to True. - In the last step of the episode, one of the remaining agents terminates, i.e., it’s
terminatedvalue is set to True.
Is it correct to set both terminated and truncated to True for the given agent? terminated because it did terminate. truncated because the episode time limit is hit and truncated["__all__"]==True, where I would expect truncated=True for all remaining agents.
Ultimately, what matters is that the reward is not bootstrapped for the agent that is terminating in the last time step. This should be the case if terminated is true irrespective of the truncated value, correct?
I couldn’t really find the right place in the code, but inside the policy, it seems like only the terminated values matter. Could someone confirm?
Thanks!