Rollout workers spend too much time on set_weights()

Our team has been using RLLib since this summer. While we hoped that using ray as a backend would speed up our training, when compared to other one-off github repos created by other researchers, our training was 10x slower to finish an iteration if running on RLLib. We did our due diligence to make sure our timestep_per_iteration, batch_size, num_envs etc are “equivalent” as much as possible…

While troubleshooting, we realized that most of our workers (each uses default 1CPU to collect samples) are spending most of their time at set_weights().

This seemed a bit odd to me, since the code itself is fetching dictionaries via ray.get() and updating the dictionaries, which shouldn’t take this long from what I’m understanding. We have num_workers: 60, num_gpus: 1 on a 64core machine with 1 GPU; num_envs_per_worker: 4 (anything above seems to really risk crashing the machine). Our Q and policy models are FC, `[2048, 2048, 2048, 2048]'. Training intensity was set to 1000. I wish I could provide a replicable example, but we’re currently using custom algorithm (inheriting Ray’s SAC) and custom environment (based on OpenAIGym), but I was hoping to get a guidance based on this description. Thank you!

That is, by our standards, not the smallest model. But since you are doing this on a single machine, there is no need for serialization/deserialization when calling ray.get() here - the weights will be in memory already. Even though the model is a little on the larger side, the time to set weights should be ok. Can you please access RLlib’s logs with TB and have a look at the synch_weights_time_ms graph and compare that to other graphs? Is it below 10ms? It’s obviously a time-consuming operation.