How severe does this issue affect your experience of using Ray?
None: Just asking a question out of curiosity
The example manual inference loop seems to be setting the n_* values as the same value, should these not be based on the results of previous rewards/actions/observations from previous loops?
Your example code looks good to me except for one thing. You are adding more recent time steps on the opposite side as rllib does internally. It places earlier steps to the left of more recent steps [0,1,2,3] your use of append left is the opposite [3,2,1,0].
This will mean that when you do manual inference your prev_x inputs will be the reverse of the order used during training.
Also I think all the prev_x will be initialized by zeros.
Thanks for the insight here, yeah I wasn’t sure about whether I should be appending to left or right, thanks for clarifying that.
But shouldn’t the example code I linked be updated (happy to send a PR) so that prev_x actually have previous values? Otherwise they’re always being sent as static values? Or am I missing something?