I am getting the following error. Have anyone experienced this? The scheduler system is Slurm. To provide more context, I ran the same script in another cluster with the same scheduler and it worked.
(raylet) OpenBLAS blas_thread_init: RLIMIT_NPROC 4096 current, 1541207 max
(raylet) OpenBLAS blas_thread_init: pthread_create failed for thread 55 of 64: Resource temporarily unavailable
(raylet) OpenBLAS blas_thread_init: RLIMIT_NPROC 4096 current, 1541207 max
Alternatively, you should set OPENBLAS_NUM_THREADS to a reasonable number per process (e.g. if you know you have 4 tasks, and a 60 cores, you could set it to 60/4 = 15) as a hack.
W.r.t that error, it’s a known issue with grpc I believe. https://github.com/grpc/grpc/issues/31885