In the IMPALA trainer when the learner queue is full (I am using an expensive model so the GPU training is slow) the algorithm seems to move training to the CPU, making the training unbearably slow. The desired behavior would be that the CPUs go idle if the GPU is catching up. This effect can be postponed by setting learner_queue_size to a higher value, but it only postpones the issue, the queue eventually fills up and training becomes incredibly slow. Is there a config variable I am missing that stops this behavior?
EDIT - despite training being extremely slow (forward pass 100x slower), all tensors seem to still on the GPU, making the issue even more perplexing.