[RLlib] Ray RLlib config parameters for PPO

@sven1977 Again, thanks for your explanations!

Does this mean that a potentially existing smaller last minibatch will be ignored and not used?
If so, then a train_batch_size is a mutliple of sgd_minibatch_size would be always recommendable.