RLLib Parallelization Guide

Dear community,
due to lots of work on my custom Gym environment, I can run much more iterations with bigger training batch sizes and SGD mini-batch sizes than before. Still, it is quite time-consuming. Would there be any parallelization guide available for RllIb (old stack) ? It could contain topic like how many rollout workers make sense in relation to the iteration count / batch size, whatever.