Issue: RuntimeError: can't start new thread

How severe does this issue affect your experience of using Ray?

  • None: Just asking a question out of curiosity

I often meet RuntimeError: can't start new thread in local deveopment environment. Usually, it caused by call too many tasks/actors at the same time. From the logs, it seems that channels in grpc had exceeded the limit. There is a typical traceback:

Traceback (most recent call last):
  File "/home2/hanwen.qiu/miniconda3/envs/ray_server/lib/python3.8/threading.py", line 932, in _bootstrap_inner
    self.run()
  File "/home2/hanwen.qiu/miniconda3/envs/ray_server/lib/python3.8/threading.py", line 870, in run
    self._target(*self._args, **self._kwargs)
  File "/home2/hanwen.qiu/miniconda3/envs/ray_server/lib/python3.8/site-packages/ray/_private/worker.py", line 868, in print_logs
    data = subscriber.poll()
  File "/home2/hanwen.qiu/miniconda3/envs/ray_server/lib/python3.8/site-packages/ray/_private/gcs_pubsub.py", line 362, in poll
    self._poll_locked(timeout=timeout)
  File "/home2/hanwen.qiu/miniconda3/envs/ray_server/lib/python3.8/site-packages/ray/_private/gcs_pubsub.py", line 249, in _poll_locked
    fut = self._stub.GcsSubscriberPoll.future(
  File "/home2/hanwen.qiu/miniconda3/envs/ray_server/lib/python3.8/site-packages/grpc/_channel.py", line 972, in future
    call = self._managed_call(
  File "/home2/hanwen.qiu/miniconda3/envs/ray_server/lib/python3.8/site-packages/grpc/_channel.py", line 1306, in create
    _run_channel_spin_thread(state)
  File "/home2/hanwen.qiu/miniconda3/envs/ray_server/lib/python3.8/site-packages/grpc/_channel.py", line 1270, in _run_channel_spin_thread
    channel_spin_thread.start()
  File "src/python/grpcio/grpc/_cython/_cygrpc/fork_posix.pyx.pxi", line 117, in grpc._cython.cygrpc.ForkManagedThread.start
  File "/home2/hanwen.qiu/miniconda3/envs/ray_server/lib/python3.8/threading.py", line 852, in start
    _start_new_thread(self._bootstrap, ())
RuntimeError: can't start new thread

I have tried some environment various, like OMP_NUM_THREADS, OPENBLAS_NUM_THREADS. The outcome is not satisfied with me. Sometimes, it also caused the head node crash too.

Here is my question:

  1. The reason why this issue occured ?
  2. How can i avoid this happened? Any related configs i had not noticed before in the document?

Any ideas? Thank the community. I am wondering if I can add a PR to the document to address the confusion that new users may encounter with such issues.

The verison of ray i used is 2.4.0.

I can found the similar question in discussion: RuntimeError: can't start new thread.

I can provide a reproducible script if you needed.

I think this issue happens when you reach to the system’s per process thread limit.

I think this probably can help? python - error: can't start new thread - Stack Overflow

1 Like

(specifically I believe ulimit -u can modify the limit)

Thanks. I think it caused by hardware environment. It works well when i change to a low load machine. Not much help when i change limit of threas by ulimit -u in the high load machine. Anyway, it looks like related to the machine not ray.