I have the following systems that I wish to run distributed GPU training on
- 32 core CPU + 2070S + 2070S
- 16 core CPU + 2070S + 1080
- 16 core CPU + 1080 + 1080
What are the downsides of using all 3 systems for perform distributed training of a model? Node #2 has a mix of 2 GPU, while node #1 and #3 uses different GPU models.
When tuning hyperparameter (with Ray Tune), is it better to train a model across all 3 systems, or train 3 models in parallel with 3 different set of hyperparameters where each system trains using its own set of hyperparameters?