Thank you developing Ray !
I wanted to ask the following question about Ray Train. Is there something that can impede the following scenarios?
-
Client-based distributed training across a K8s cluster with heterogeneous GPUs - some workers are running with GTX 1080Tis and some others with RTX Ampere GPUs (A4000/5000). If this is not allowed are there any instructions that can limit an interactive client session (for code development) to only one GPU compute capability (eg. Ampere) ?
-
Batch job submission across the same cluster of heterogeneous GPUs
I thought to ask first perform spending time trying to make this work so I would appreciate any pointers from anyone.
Thank you
How severe does this issue affect your experience of using Ray?
- Low: It annoys or frustrates me for a moment.