Mitigating simultaneous GPU voltage spikes during initialization

were · April 5, 2025, 3:11pm

1. Severity of the issue: (select one)
Low: Annoying but doesn’t hinder my work.
2. Environment:

3. ~~What happened vs. what you expected~~:
3. Desired vs. Actual Behavior and Reason:

Desired: Modifiable short delay between starting each GPU
Actual: All GPUs start jobs/load models simultaneously
Reason: NVIDIA GPUs have transient power spikes that drastically exceed their power limits but last on the order of milliseconds. These spikes, concurrently, can trip PSUs that are otherwise capable of powering the GPUs under regular operation and handling the transients of each GPU individually. I have worked around this by limiting frequency on the GPUs to keep transients below the breaker tripping threshold, how difficult would it be to add an option to add a small adjustable delay? If anyone can point me at the correct location I can try to implement.

Topic		Replies	Views
【Critical】ray uses GPU issues Ray Core	3	293	November 18, 2022
Run Python function in parallel on GPU Ray Core	10	4652	January 28, 2022
The fractional GPUs is very slow?! Ray Core	2	450	September 8, 2021
Warning regarding limited resources	0	414	November 13, 2023
RL Trial Stuck at pending when trying to use Multi-GPU RLlib	2	1442	October 13, 2021