1. Severity of the issue: (select one)
None: I’m just curious or want clarification.
Low: Annoying but doesn’t hinder my work.
Medium: Significantly affects my productivity but can find a workaround.
High: Completely blocks me.
2. Environment:
- Ray version: 2.43.0
- Python version: 3.11.7
- OS: MacOs 15.2
- Cloud/Infrastructure:
- Other libs/tools (if relevant):
3. What happened vs. what you expected:
- Expected: Similarity between the next_state_in in the batch of forward vs the next_state_in set at the output of forward
- Actual: There doesn’t seem to be any correlation between the two next_state_ins.
Can someone give confirmation on how next_state_in works? There is no documentation around it. I am currently using it in my code like this. After setting Columns.NEXT_STATE_IN in the output of the forward function, does it become available in Columns.NEXT_STATE_IN of the batch in the forward of next time step?
def _forward_intermediate(self, batch):
initialHidden = None
if "next_state_in" in batch:
initialHidden = batch["next_state_in"].unsqueeze(0)
.
.
.
@override(TorchRLModule)
def _forward(self, batch, **kwargs):
currentStateFeatures, initialStateFeatures = self._forward_intermediate(batch)
policy = self.policy_branch(currentStateFeatures)
return {
Columns.ACTION_DIST_INPUTS: policy,
Columns.NEXT_STATE_IN: initialStateFeatures,
}
I tried debugging the two NEXT_STATE_INs by printing them, but the values are never the same.