1. Severity of the issue: (select one)
None: I’m just curious or want clarification.
Low: Annoying but doesn’t hinder my work.
Medium: Significantly affects my productivity but can find a workaround.
High: Completely blocks me.
2. Environment:
- Ray version: 2.4
- Python version: 3.10
- OS: ubuntu
- Cloud/Infrastructure:
- Other libs/tools (if relevant):
3. What happened vs. what you expected:
- Expected:
I am planning to torch device mesh when using ray train. However, ray train initializesdist.init_process_group()
as default (ref). This is needed since fsdp2 requires a device mesh to customize sharding strategies. Is there a workaround for this?