[RLlib] Ray RLlib config parameters for PPO

Hey @Xim_Lee ,
check out this documentation page here, where we explain all these config keys in more detail.
https://docs.ray.io/en/master/rllib-sample-collection.html

On the PPO-specific keys:
sgd_minibatch_size: PPO takes a train batch (of size train_batch_size) and chunks it down into n sgd_minibatch_size sized pieces. E.g. if train_batch_size=1000 and sgd_minibatch_size=100, then we create 10 “sub-sampling” pieces out of the train batch.
num_sgd_iter: The above sub-sampling pieces are then fed num_sgd_iter times to the NN for updating. So in the above example and if num_sgd_iter=30, we do 30 x 10 updates altogether on one single train batch.

4 Likes