Setting terminated and truncated at episode end

Hi,

I am using Ray RLlib (2.3) with Gymnasium, i.e., distinguishing terminated and truncated, for multi-agent RL (currently with APEX-DQN).
I am considering the following edge case:

  • The episode time limit is hit while some agents did not yet terminate, so the episode is truncated and truncated["__all__"] is set to True.
  • In the last step of the episode, one of the remaining agents terminates, i.e., it’s terminated value is set to True.

Is it correct to set both terminated and truncated to True for the given agent? terminated because it did terminate. truncated because the episode time limit is hit and truncated["__all__"]==True, where I would expect truncated=True for all remaining agents.

Ultimately, what matters is that the reward is not bootstrapped for the agent that is terminating in the last time step. This should be the case if terminated is true irrespective of the truncated value, correct?

I couldn’t really find the right place in the code, but inside the policy, it seems like only the terminated values matter. Could someone confirm?

Thanks!

1 Like
  • The episode time limit is hit while some agents did not yet terminate, so the episode is truncated and truncated["__all__"] is set to True.
  • In the last step of the episode, one of the remaining agents terminates, i.e., it’s terminated value is set to True.

So the episode that you are looking at, the one that is completed during sampling in the EnvRunnerV2 for example - that one has truncated["__all__"]=True and terminated=True for one agent? Have you followed these value around? You can set ray.init(local_mode=True) and follow these values to see what happens to them in post processing etc?

1 Like