Setting up trajectory view correctly for repeated+non-repeated input

utkarshp · October 14, 2022, 5:36pm

High: It blocks me to complete my task.

I am trying to use trajectory view for my model which is supposed to have an LSTM and a fully connected network in parallel. This is the workflow I have in mind:

recurrent obs = obs[“repeated”][-10:0] → LSTM → Fully
Connected → Output
non-recurrent = obs[“fixed”] ---------------> Network

I am thinking my best bet would be to implement a custom RNN.
So this is how I set up my obs space:

obs_space = spaces.Dict({
            "repeated": spaces.Box(low=np.array(([0] * 3 + [0, 0, 0, 0, 0, -1, 0, 0, 0, -np.inf])),
                                   high=np.array(([np.inf] * 8 + [1] + [np.inf] * 4)),
                                   dtype=np.float64),
            "fixed": spaces.Box(low=
                                np.array([0, 0, -1, 0]),
                                high=np.array([1, 1, np.inf, np.inf]),
                                dtype=np.float64)
        })

I then have this in my Policy.postprocess_trajectory function:

class new_policy(A3CTorchPolicy):
        def postprocess_trajectory(self, sample_batch: SampleBatch,
                                   other_agent_batches: Optional[Dict[AgentID, SampleBatch]] = None,
                                   episode: Optional[Episode] = None):
            sample_batch = super(new_policy, self).postprocess_trajectory(sample_batch,
                                                                          other_agent_batches,
                                                                          episode)
            print("sample", sample_batch["obs"])
            sample_batch["fixed"] = np.array(sample_batch["obs"][:4])
            sample_batch["repeated"] = np.array(sample_batch["obs"][4:])
            return sample_batch

Here I use [“obs”][:4] instead of [“obs”][“fixed”] because obs seems to be a flattened numpy array at this point, and not a dict as I would have expected, but that’s fine.

Then, I set up my view requirement in my model

class my_model(TorchModelV2, torch.nn.Module):
    def __init__(self, obs_space: gym.spaces.Space, action_space: gym.spaces.Space,
                 num_outputs: int, model_config: ModelConfigDict, name: str):
          self.view_requirements["rep"] = ViewRequirement(data_col="repeated", shift="-10:0")

    def forward(self, input_dict: Dict[str, TensorType],
                state: List[TensorType],
                seq_lens: TensorType) -> (TensorType, List[TensorType]):
        print("Got input", input_dict["obs"], input_dict["obs_flat"], input_dict)

However, in my forward call, I don’t get the column “rep” at all! I also don’t see “repeated” or “fixed” columns. I suspect there is a flaw in my understanding of the trajectory view API and I would appreciate any insights into what is going wrong here and what would be the best way for me to do this.

Thanks in advance!

kourosh · January 9, 2023, 2:46am

Hey @utkarshp,

Have you tried setting _disable_preprocessor_api to False? Setting this flag, will tell RLlib to not flatten the data structure of the observation.

Trajectory view API is responsible for dictating the constraints for trajectory construction and dependencies of certain keys on others across time. So here you need to first get access to repeated and fixed key under each obs and then using TrajectoryViewAPI decide how to contextualize your input state as a function of these keys.

Topic		Replies	Views
Custom Recurrent Network and TrajectoryView RLlib	3	321	February 24, 2021
Trying to understand model and env concepts RLlib	7	486	September 29, 2021
Multiple observations including RNN's RLlib	3	338	May 18, 2022
States of Recurrent models for multiple workers/envs RLlib	1	303	April 14, 2021
Errors during training using BC with custom rnn RLlib	2	416	March 7, 2023

Setting up trajectory view correctly for repeated+non-repeated input

Related topics