Hi, I am using PyTorch Lightning(1.5.10), Ray Lightning(0.2) and Ray Tune(1.10.0) to distribute training and tuning. It is unclear to me on how to do resource allocation using RayPlugin from the documentation
Here(ray_lightning/ray_ddp.py at 65f497a3c8bedb2f24bf04a5dbf0ea62b5bcb4d6 · ray-project/ray_lightning · GitHub) it says, specify GPUs in Pytorch Lightning Trainer to a value > 0
and Here(ray_lightning/ray_ddp.py at 65f497a3c8bedb2f24bf04a5dbf0ea62b5bcb4d6 · ray-project/ray_lightning · GitHub) in example it says NOT to specify resource in Trainer
Which one is right? I am able to run tuning experiments in parallel but unable to distribute training.I have 5 workers nodes with each 8 GPUs on each node
How severe does this issue affect your experience of using Ray?
- High: It blocks me to complete my task.