How next_state_in works

1. Severity of the issue: (select one)
None: I’m just curious or want clarification.
Low: Annoying but doesn’t hinder my work.
Medium: Significantly affects my productivity but can find a workaround.
High: Completely blocks me.

2. Environment:

  • Ray version: 2.43.0
  • Python version: 3.11.7
  • OS: MacOs 15.2
  • Cloud/Infrastructure:
  • Other libs/tools (if relevant):

3. What happened vs. what you expected:

  • Expected: Similarity between the next_state_in in the batch of forward vs the next_state_in set at the output of forward
  • Actual: There doesn’t seem to be any correlation between the two next_state_ins.

Can someone give confirmation on how next_state_in works? There is no documentation around it. I am currently using it in my code like this. After setting Columns.NEXT_STATE_IN in the output of the forward function, does it become available in Columns.NEXT_STATE_IN of the batch in the forward of next time step?

def _forward_intermediate(self, batch):
    initialHidden = None
    if "next_state_in" in batch:
        initialHidden = batch["next_state_in"].unsqueeze(0)
    .
    .
    .
@override(TorchRLModule)
def _forward(self, batch, **kwargs):
    currentStateFeatures, initialStateFeatures = self._forward_intermediate(batch)
    policy = self.policy_branch(currentStateFeatures)
    return {
        Columns.ACTION_DIST_INPUTS: policy,
        Columns.NEXT_STATE_IN: initialStateFeatures,
    }

I tried debugging the two NEXT_STATE_INs by printing them, but the values are never the same.