Supplying a lower timeout_s to TorchConfig helps, but I’d still expect ray to throw the error immediately.
trainer = TorchTrainer(
train_fn,
scaling_config=ScalingConfig(
num_workers=6, use_gpu=use_gpu),
torch_config=TorchConfig(timeout_s=10)
)