I have written a simple training procedure using the standard API calls. For example, I get the following output on the command line (shortened version):
2021-06-02 17:10:13,390 INFO rollout_worker.py:741 -- Completed sample batch:
{
...
'actions': np.ndarray((167,), dtype=int64, min=0.0, max=6.0, mean=2.976),
'advantages': np.ndarray((167,), dtype=float32, min=-2.164, max=2.518, mean=-0.077),
'new_obs': np.ndarray((167, 11), dtype=float32, min=-0.906, max=1.0, mean=0.496),
'obs': np.ndarray((167, 11), dtype=float32, min=-0.906, max=1.0, mean=0.496),
'rewards': np.ndarray((167,), dtype=float32, min=-2.2, max=2.48, mean=-0.111),
'state_in_0': np.ndarray((17, 128), dtype=float32, min=-0.298, max=0.214, mean=-0.002),
'state_in_1': np.ndarray((17, 128), dtype=float32, min=-0.532, max=0.398, mean=-0.003),
'state_out_0': np.ndarray((167, 128), dtype=float32, min=-0.301, max=0.215, mean=-0.002),
'state_out_1': np.ndarray((167, 128), dtype=float32, min=-0.536, max=0.4, mean=-0.003),
...
}
I would like to understand why there are only 17 elements for state_in_[0|1], while there are 167 elements for all other variables.