Distributed training in PyTorch and init_process_group

Hey @vblagoje, the reason for this behavior is that under the hood each iteration will wait for all processes to report metrics via tune.report before continuing training (though ultimately only the metrics from the worker process with rank 0 will be propagated up) - you can think of this as a way to ensure that all processes are synchronized.

Would you be able to invoke tune.report on all workers?