I’ve read into the code a bit regarding placement group in RaySGD, but didn’t find an answer.
Let us say I have a cluster with two machines, A and B. I want to start 10 training processes on each machine. The learning relies on a in-memory data service, therefore we want to initialize training processes on A or B with different parameters (very simple just a Rank as int).
Is this already supported in RaySGD? I’ve read this PR: [tune/placement group] dist. training placement group support by oliverhu · Pull Request #11934 · ray-project/ray · GitHub, as well as how TorchTrainer and TrainingOperators are implemented. But cannot tell for sure.
Thanks a lot!