How severe does this issue affect your experience of using Ray?
- Low: It annoys or frustrates me for a moment.
I am getting a lot of the following error messages
worker_pool.cc:544: Some workers of the worker process(141872) have not registered within the timeout. The process is still alive, probably it's hanging during start.
When I look at the worker process ids they are all spill or restore workers whose log looks like the following:
[2023-08-13 08:30:10,296 I 139895 139895] core_worker_process.cc:107: Constructing CoreWorkerProcess. pid: 139895 [2023-08-13 08:30:37,591 I 139895 139895] io_service_pool.cc:35: IOServicePool is running with 1 io_service.
My program is still running so it isn’t a blocker but I’m curious if there’s any way to resolve this in case it becomes one later in the jobs. I am running on a SLURM cluster with Ray 2.6.2. Thanks in advance!