PPO multi GPU optimizer

Kunzro · January 5, 2023, 1:23pm

Hey everyone,

I have an question/issue regarding the PPO more precisely about the multi GPU optimizer it uses.
The problem that I’m having is that since I have a rather large observation and large model (graph convolutional NNs, obviously graphs (which can differ in size, so I also have an overhead from padding with zeros)) my GPU memory gets filled quiet rapidly since all the data is pinned to the GPU memory (as intended by the “multi GPU optimizer”). Now my question is, is there an easy way to change this behavior (i.e. stream the data from RAM to GPU memory) and if not, maybe there should be. As far as I could find in the documentation there is something like that for A2C:

microbatch_size – A2C supports microbatching, in which we accumulate gradients over batch of this Preformatted textsize until the train batch size is reached. This allows training with batch sizes much larger than can fit in GPU memory. To enable, set this to a value less than the train batch size.

kourosh · January 26, 2023, 12:07am

Hi @Kunzro , where is zero padding coming from? Can you give a bit more context?

Kunzro · January 26, 2023, 10:19pm

Hey @kourosh

Thanks for your reply.

My (custom) environment does some operations or rather actions on a graph, where the number of nodes can and do vary every iteration. And the observation consists of some features for every node so it has the shatpe (#nodes, #features per node). So my solution to thsi currently is to just say that the observation space is e.g. (2 x #initial nodes, #features per node), zero pad said shape and hope that the graph never reaches a state where it has more than 2x number of initial nodes. The NN model than handles to observation by removing the zeropadded part.

Topic		Replies	Views
PPO is using too much GPU memory RLlib	3	1794	July 28, 2021
Batch sizes on GPU RLlib	5	831	July 27, 2022
[rllib] Performance of PPO with two gpus is worse than using only one gpu RLlib	1	415	January 3, 2022
Why is GPU usage capped at 50% when training PPO? Debugging and performance tuning	0	379	July 13, 2023
Using pre-trained PPO for Inference RLlib	1	55	January 7, 2025

PPO multi GPU optimizer

Related topics