Is there a way to turn off dummy batch initialization on worker nodes?

In algorithms like IMPALA, we don’t need to initialize the loss function on the workers.

when using GPUs this leads to totally unecessary GPU memory being wasted because the allocated gradients are not removed, so a batch of 200 (the size of the dummy batch) gradients is left on the GPU.

Is there a way to turn off the dummy batch initialization on workers so this doens’t happen?

Hi @Bam4d,

I saw your github issue that elaborates on this question.

I am actually surprised that the

I have not seen a way to turn it off but wrt the loss question specifically I have two thoughts.

  1. I may be missing some key detail, but I am surprised that the remote workers compute the loss at all during initialization. Even for most of the synchronous algorithms the remote workers do not do any calculating of loss or updating of weights during training. MAML comes to mind as one that would need the loss initialization bits for both local and remote workers.

  2. Speaking of MAML I was looking at the policy recently and saw that they have logic that selects a different loss class based on if the worker_index = 0. I wonder if you could do something similar for impala where you give the remote workers (worker_index>0) a noop loss function.

Followingg up on 1. I guess part of the issue is that the details of how the gradients are computed and applied is hidden away on the execution plan. The policy does not have access to that information and cannot use it to know which workers are and are not involved in the train step.