Our team has been using RLLib since this summer. While we hoped that using ray as a backend would speed up our training, when compared to other one-off github repos created by other researchers, our training was 10x slower to finish an iteration if running on RLLib. We did our due diligence to make sure our timestep_per_iteration, batch_size, num_envs etc are “equivalent” as much as possible…
While troubleshooting, we realized that most of our workers (each uses default 1CPU to collect samples) are spending most of their time at set_weights()
.
This seemed a bit odd to me, since the code itself is fetching dictionaries via ray.get()
and updating the dictionaries, which shouldn’t take this long from what I’m understanding. We have num_workers: 60, num_gpus: 1
on a 64core machine with 1 GPU; num_envs_per_worker: 4
(anything above seems to really risk crashing the machine). Our Q and policy models are FC, `[2048, 2048, 2048, 2048]'. Training intensity was set to 1000. I wish I could provide a replicable example, but we’re currently using custom algorithm (inheriting Ray’s SAC) and custom environment (based on OpenAIGym), but I was hoping to get a guidance based on this description. Thank you!