TorchTrainer: Collective operation timeout: WorkNCCL

@kai @xwjiang2010, could you please suggest any workaround for this issue?