Distributed training and distributed distributed training

1. Severity of the issue: (select one)
None: I’m just curious or want clarification.

2. Environment:

  • Ray version: 2.47.1
  • Python version: 3.12.8
  • OS: Arch
  • Cloud/Infrastructure: Local

In this tutorial:
net = nn.DataParallel(net)
but in this one it haven’t been used :
does that mean in the first one it will use every gpu in the cluter and the secont one will use a singel gpu(in every node or overrall)

and how does that differ from ray trayn