Hi,
I’m working on RL problems that involve dynamically changing input spaces (sets and graphs).
I’m using the Repeated
space, as that’s design for this purpose, i.e. to represent a graph (following the PyG convention):
d = {
"x": Repeated(
gym.spaces.Box(-1, 1, shape=x_shape, dtype=np.float32),
MAX_ELEMS,
),
"edge_index": Repeated(
gym.spaces.Box(0, MAX_ELEMS, shape=(2,), dtype=np.int64),
MAX_ELEMS ** 2,
),
}
observation_space = gym.spaces.Dict(d)
It works fine in general. However, it creates a significant overhead, because the allocated tensors scale with x_shape*MAX_ELEMS + 2*MAX_ELEMS**2
.
I understand that the flattened
observations need to be fixed, but I never use that anywhere. This creates a large memory and latency overhead on the GPU, because the whole zero-padded tensor is transferred from CPU to GPU. This is especially bad when the number of elements follow a long tailed Poisson distribution, because I need to use a large MAX_ELEMS
while most of the time there’s only a fraction of the allocated tensor utilized.
I created a graph space for this purpose, such as:
class GraphSpace(gym.Space):
def __init__(self, x_shape, e_shape=None, dtype=np.float32):
super().__init__()
self.x_shape = x_shape
self.e_shape = e_shape
self._shape = (0,)
self.dtype = dtype
def sample(self):
if self.e_shape:
edge_attr = self.np_random.normal(size=self.e_shape)[np.newaxis].astype(
self.dtype
)
else:
edge_attr = None
x = self.np_random.normal(size=self.x_shape)[np.newaxis].astype(self.dtype)
return {
"x": x,
"edge_attr": edge_attr,
"edge_index": np.array([[0], [0]]),
}
def contains(self, x):
# placeholder
return True
def __repr__(self):
return "Graph({}, {})".format(self.x_shape, self.e_shape)
It does work in principle, however, there are a few bits that need to be adjusted.
E.g. for batching RLlib assumes arrays with the same dimensionality, but in this case that’s not true, so instead of a normal Numpy array (size: batch x features) it creates an object array which breaks slicing downstream and Pytorch tensor conversion.
I haven’t finished the conversion because there are a lot of places that need to be changed, but to me it seems like there’s no fundamental issue in allowing variable input sizes, only syntactic.
Also, Pytorch Geometric has a very convenient batching mechanism to address the same problem, while making it possible to handle the input tensors as a regular 2d tensor with size BN x F (batch size * number of items x feature size). Integrating this approach would be a large endeavour, but it could enable a lot of applications that are currently limited by the current implementation.
So, is there any plan to enable more efficient and flexible dynamic spaces? I.e. pretty much the same functionality that Repeated
has, but without the overhead of zero-padding.