RLIMIT problem when running GPU code

How severe does this issue affect your experience of using Ray?

  • High: It blocks me to complete my task.

When I am running Ray with GPU code and increase the number of GPUs per worker node, I keep running into this issue.

2me[33m(raylet)e[0m OpenBLAS blas_thread_init: RLIMIT_NPROC 4194304 current, 4194304 max
e[2me[33m(raylet)e[0m OpenBLAS blas_thread_init: pthread_create failed for thread 60 of 64: Resource temporarily unavailable
e[2me[33m(raylet)e[0m OpenBLAS blas_thread_init: RLIMIT_NPROC 4194304 current, 4194304 max
e[2me[33m(raylet)e[0m OpenBLAS blas_thread_init: pthread_create failed for thread 61 of 64: Resource temporarily unavailable
e[2me[33m(raylet)e[0m OpenBLAS blas_thread_init: RLIMIT_NPROC 4194304 current, 4194304 max
e[2me[33m(raylet)e[0m OpenBLAS blas_thread_init: pthread_create failed for thread 62 of 64: Resource temporarily unavailable
e[2me[33m(raylet)e[0m OpenBLAS blas_thread_init: RLIMIT_NPROC 4194304 current, 4194304 max

@raghukiran looks like OpenBLAS is creating too many threads. One thing I’d try is to set
OMP_NUM_THREADS env variable to a smaller number before starting Ray