RLIMIT problem when running GPU code

How severe does this issue affect your experience of using Ray?

  • High: It blocks me to complete my task.

When I am running Ray with GPU code and increase the number of GPUs per worker node, I keep running into this issue.

2me[33m(raylet)e[0m OpenBLAS blas_thread_init: RLIMIT_NPROC 4194304 current, 4194304 max
e[2me[33m(raylet)e[0m OpenBLAS blas_thread_init: pthread_create failed for thread 60 of 64: Resource temporarily unavailable
e[2me[33m(raylet)e[0m OpenBLAS blas_thread_init: RLIMIT_NPROC 4194304 current, 4194304 max
e[2me[33m(raylet)e[0m OpenBLAS blas_thread_init: pthread_create failed for thread 61 of 64: Resource temporarily unavailable
e[2me[33m(raylet)e[0m OpenBLAS blas_thread_init: RLIMIT_NPROC 4194304 current, 4194304 max
e[2me[33m(raylet)e[0m OpenBLAS blas_thread_init: pthread_create failed for thread 62 of 64: Resource temporarily unavailable
e[2me[33m(raylet)e[0m OpenBLAS blas_thread_init: RLIMIT_NPROC 4194304 current, 4194304 max

@raghukiran looks like OpenBLAS is creating too many threads. One thing Iā€™d try is to set
OMP_NUM_THREADS env variable to a smaller number before starting Ray

1 Like

Had a similar problem. For me, running the script directly was fine, but when running on Slurm I got the OpenBLAS error.

Adding os.environ["OMP_NUM_THREADS"] = "1" at the beginning of my script solved it for me. Thanks @Chen_Shen :+1: