PPO multi GPU optimizer

Hey everyone,

I have an question/issue regarding the PPO more precisely about the multi GPU optimizer it uses.
The problem that I’m having is that since I have a rather large observation and large model (graph convolutional NNs, obviously graphs (which can differ in size, so I also have an overhead from padding with zeros)) my GPU memory gets filled quiet rapidly since all the data is pinned to the GPU memory (as intended by the “multi GPU optimizer”). Now my question is, is there an easy way to change this behavior (i.e. stream the data from RAM to GPU memory) and if not, maybe there should be. As far as I could find in the documentation there is something like that for A2C:

microbatch_size – A2C supports microbatching, in which we accumulate gradients over batch of this Preformatted textsize until the train batch size is reached. This allows training with batch sizes much larger than can fit in GPU memory. To enable, set this to a value less than the train batch size.

Hi @Kunzro , where is zero padding coming from? Can you give a bit more context?

Hey @kourosh

Thanks for your reply.

My (custom) environment does some operations or rather actions on a graph, where the number of nodes can and do vary every iteration. And the observation consists of some features for every node so it has the shatpe (#nodes, #features per node). So my solution to thsi currently is to just say that the observation space is e.g. (2 x #initial nodes, #features per node), zero pad said shape and hope that the graph never reaches a state where it has more than 2x number of initial nodes. The NN model than handles to observation by removing the zeropadded part.