Hello everyone,
I’m new to Ray and I’m exploring how to effectively manage GPU resources based on their interconnect topology (e.g., NVLink vs. PCIe). I have a machine with GPUs connected via different interconnects, and I’d like to schedule tasks on specific sets of GPUs to optimize performance.
My goal is to define custom resources that map to specific GPU IDs and then use those resources in my tasks. I’m wondering if this is the recommended approach and how to best combine it with placement groups.
Any guidance or pointers on how to achieve topology-aware scheduling in Ray would be greatly appreciated. Thank you in advance for your help!