I’d like to use a sparse matrix as a recurrent state. These are organised as a tensor of indices and a tensor of values. This greatly improves memory efficiency and computational efficiency (not multiplying a bunch of zeros), but the tensors change size as new entries are added to the matrix. It seems like rllib/numpy
does not like serialising arrays of different shapes into rollouts.
I was wondering if anyone is aware of a way to use dynamically-shaped recurrent states using TorchModelv2
.
Hey @smorad , sounds like a cool idea. Can I ask how large a single one of your internal states is?
Also, wouldn’t your model change most of the zeros in the initial state after one or a few timesteps to something != 0.0?
It roughly scales with max_timesteps_per_episode ** 2
. With dense matrices I’m stuck somewhere around 128 timesteps due to GPU memory constraints, but I would like to try much longer horizons.
So with something like attention, this would be the case. One of the problems I’m trying to tackle is the inefficiency of attention in RL, which I’m sure you’ve experienced due to the existence of the attention_memory_inference/attention_memory_training
config keys.
An adjacency matrix denoting some dependency of observation t
on some observation t-k; k \in [0...1-t]
would contain t ** 2 * batch_size
entries. So for an episode of t=1000, this explodes. Note that changing max_seq_len
here does not help, as we could link observation t -> 0
, which means we must retain the zeroth entry at time t
.
I’m willing to put in a little work if there is a way to do this nicely.