Optimizing GPU Scheduling Based on Interconnect Topology

I_wish_it_was_midnig · January 13, 2025, 1:50am

Hello everyone,

I’m new to Ray and I’m exploring how to effectively manage GPU resources based on their interconnect topology (e.g., NVLink vs. PCIe). I have a machine with GPUs connected via different interconnects, and I’d like to schedule tasks on specific sets of GPUs to optimize performance.

My goal is to define custom resources that map to specific GPU IDs and then use those resources in my tasks. I’m wondering if this is the recommended approach and how to best combine it with placement groups.

Any guidance or pointers on how to achieve topology-aware scheduling in Ray would be greatly appreciated. Thank you in advance for your help!

jjyao · January 13, 2025, 5:10pm

Do you mean GPUs on the same node have different interconnects, or GPUs on different nodes have different interconnects?

Topic		Replies	Views
How to prevent scheduling non-GPU tasks to GPU nodes Ray Core	6	131	September 30, 2024
Submit remote work to a specific worker Ray Core	8	599	September 26, 2023
Can we make ray evenly schedule tasks on different GPUs? Ray Core	3	310	January 11, 2021
How to assign actors to specific machines? Ray Core	2	310	January 8, 2024
GPU Memory Aware Scheduling Ray Core	8	951	March 12, 2024

Optimizing GPU Scheduling Based on Interconnect Topology

Related topics