Hi,
I am using Ray RLlib (2.3) with Gymnasium, i.e., distinguishing terminated and truncated, for multi-agent RL (currently with APEX-DQN).
I am considering the following edge case:
- The episode time limit is hit while some agents did not yet terminate, so the episode is truncated and
truncated["__all__"]
is set to True. - In the last step of the episode, one of the remaining agents terminates, i.e., it’s
terminated
value is set to True.
Is it correct to set both terminated
and truncated
to True for the given agent? terminated
because it did terminate. truncated
because the episode time limit is hit and truncated["__all__"]==True
, where I would expect truncated=True
for all remaining agents.
Ultimately, what matters is that the reward is not bootstrapped for the agent that is terminating in the last time step. This should be the case if terminated
is true irrespective of the truncated
value, correct?
I couldn’t really find the right place in the code, but inside the policy, it seems like only the terminated
values matter. Could someone confirm?
Thanks!