1. Severity of the issue: (select one)
Low: Annoying but doesn’t hinder my work.
2. Environment:
- Ray version: Any
- Python version: Any
- OS: Linux
- Cloud/Infrastructure: 4x3090 Ti, 1x 1600 PSU, 48-core Threadripper
- Other libs/tools (if relevant): PyTorch, vLLM
3. What happened vs. what you expected:
3. Desired vs. Actual Behavior and Reason:
- Desired: Modifiable short delay between starting each GPU
- Actual: All GPUs start jobs/load models simultaneously
- Reason: NVIDIA GPUs have transient power spikes that drastically exceed their power limits but last on the order of milliseconds. These spikes, concurrently, can trip PSUs that are otherwise capable of powering the GPUs under regular operation and handling the transients of each GPU individually. I have worked around this by limiting frequency on the GPUs to keep transients below the breaker tripping threshold, how difficult would it be to add an option to add a small adjustable delay? If anyone can point me at the correct location I can try to implement.