Hey @nathanlct , yeah, there was a similar bug in RLlib that was fixed here:
I think this should fix your problem as well. Yes, it happened when a new(!) agent enters the episode and at the same time step, the episode terminates, such that this agent has an initial obs, but no action had to be calculated.