I am using sort of a Bayesian filter (a VAE to be exact) instead of an RNN but I would like to use the “state” variable in the RNN interface to store the belief state and I would like to directly access the entire history of (prior) belief state for training.
I though the “state_in_0” column in SampleBatch is used exactly for this purpose, so I added this to my view requirement as follow:
self.view_requirements["state_in_0"] = ViewRequirement( data_col="state_out_0", shift=-1, used_for_training=True, )
and I initialized a dummy initial state with zeros and replaced it with a belief state in the action sampler function:
def get_initial_state(self): return torch.zeros(1, dummy_dim) def action_sampler_fn(policy, model, input_dict, state, timestep): if tilmestep == 0: state = make_actual_belief_state() action, logp = model(input_dict, state) return action, logo, state
However, when I check sample batch when computing losses, the length of state_in_0 is the number of rollout episodes, while the length of state_out_0 is the number time steps, which is what I wanted.
Because this is very confusing, I would like to get some clarity on the stored variables, when they are stored, and what they are used for.