Single GPU multiprocessing

How severe does this issue affect your experience of using Ray?

  • High: It blocks me to complete my task.

I was trying parallel training using single GPU in pytorch using ray core but I got into this CUDA memory error. Is it possble to train multiple models in parallel on single GPU?

CUDA error: out of memory
CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.

hi @MakGulati
Today Ray doesn’t provide any isolation on GPU if you schedule multiple tasks/actors on the same GPU. I wonder if you can use application/library specify isolation mechanism, such as pytorch/memory.py at e44b2b72bd4ccecf9c2f6c18d09c11eff446b5a3 · pytorch/pytorch · GitHub?