It is not shocking that at certain point, there is not enough work to split (and there is a fixed cost of managing multiple workers). Thus, it is not because you multiply the number of CPUs by 13 that your running time will divide by 13. However, I’m quite curious to understand about how parallelization works in a multi-CPU environment.
The settings @Abderraim is citing looks like
# Parallel Training CPU
Thus, as we specify the number of cpus available in the tensorflow session, I understand that the parallelization happens at tensorflow level. Can anyone confirm that?
But I do not understand is how sgd is performed in this case. The minibatches seems entirely managed by RLlib
def do_minibatch_sgd(samples, policies, local_worker, num_sgd_iter,
"""Execute minibatch SGD.
for policy_id in policies.keys():
for i in range(num_sgd_iter):
iter_extra_fetches = defaultdict(list)
for minibatch in minibatches(batch, sgd_minibatch_size):
batch_fetches = (local_worker.learn_on_batch(
Thus, I understand that whatever parallelization is being done, it is done at minibatch level. Is that right?
Lastly, as Abderrahim asks, it puzzles me how the forward/backward propagation are actually performed. I don’t see anywhere a declaration of a tensorflow distribution strategy. Can we pilot it in RLlib?