Placement group support in RaySGD?

I’ve read into the code a bit regarding placement group in RaySGD, but didn’t find an answer.

Let us say I have a cluster with two machines, A and B. I want to start 10 training processes on each machine. The learning relies on a in-memory data service, therefore we want to initialize training processes on A or B with different parameters (very simple just a Rank as int).

Is this already supported in RaySGD? I’ve read this PR: [tune/placement group] dist. training placement group support by oliverhu · Pull Request #11934 · ray-project/ray · GitHub, as well as how TorchTrainer and TrainingOperators are implemented. But cannot tell for sure.

Thanks a lot!

cc’ing @rliaw, who reviewed the linked PR–any thoughts about this?

1 Like

Hmm, this is mainly supported for RaySGD + Ray TUne. However, TorchTrainer doesn’t have support for placement groups yet.

This should go onto our backlog; let me know if you would like us to prioritize it (or would be open to contributing this)! If interested, happy to guide you through the implementation.

1 Like

Sure thing. I’d happy to contribute to that if it is something that my bandwidth can handle. Let me first get a rough idea how much effort it takes.

I am thinking for the first step, we are gonna create something like this:

  1. define number of workers for each placement group
  2. RaySGD takes this mapping of placement group → num_worker, then allocate worker accordingly.

Does this align with what you guys already thinking?

Not familiar with how RaySGD work under the hood yet. Will read a bit more into it. Any code pointer (e.g. similar functionality already exist in other modules, or tests are affected etc) or doc is much appreciated.

Actually, I think it makes sense for Ray side maintainer to implement this. Will cc you on the review!

Could you help post a github issue so we can keep track of it?

1 Like

Done creating the issue. [raysgd] Placement group support in RaySGD · Issue #16682 · ray-project/ray · GitHub

Thanks, Richard!

1 Like