I saw tutorial I can implement FSDP with ray trainer, and pytorch lightning. But is it possible without pytorch lightning? I want to implement FSDP with naive pytorch.
Yes, you can use ray.train.torch.prepare_model(parallel_strategy=“fsdp”) or wrapping your model in FullyShardedDataParallel
directly.