RuntimeError: Unable to meet other processes at the rendezvous store. If you are using P2P communication, please check if tensors are put in the correct GPU

2025-02-21 02:52:05,069 ERROR worker.py:400 -- Unhandled error (suppress with 'RAY_IGNORE_UNHANDLED_ERRORS=1'): ray::MeshHostWorker.do_allreduce() (pid=180023, ip=172.20.20.220, repr=<geesibling.adapters.jax.pipeline.devicecontext.MeshHostWorker object at 0x7fb14c7bfee0>)
  File"/root/jinsc/geesibling_PPDP/python/geesibling/adapters/jax/pipeline/devicecontext.py", line 517, in do_allreduce
    col.allreduce_multigpu([concatenated_allreduce_buffer], group_name=group_name)
  File "/root/miniconda3/envs/framework-jinsc/lib/python3.9/site-packages/ray/util/collective/collective.py", line 295, in allreduce_multigpu
    g.allreduce(tensor_list, opts)
  File "/root/miniconda3/envs/framework-jinsc/lib/python3.9/site-packages/ray/util/collective/collective_group/nccl_collective_group.py", line 197, in allreduce
    self._collective(tensors, tensors, collective_fn)
  File "/root/miniconda3/envs/framework-jinsc/lib/python3.9/site-packages/ray/util/collective/collective_group/nccl_collective_group.py", line 604, in _collective
    comms = self._get_nccl_collective_communicator(key, devices)
  File "/root/miniconda3/envs/framework-jinsc/lib/python3.9/site-packages/ray/util/collective/collective_group/nccl_collective_group.py", line 431, in _get_nccl_collective_communicator
    rendezvous.meet()
  File "/root/miniconda3/envs/framework-jinsc/lib/python3.9/site-packages/ray/util/collective/collective_group/nccl_collective_group.py", line 89, in meet
    raise RuntimeError(
RuntimeError: Unable to meet other processes at the rendezvous store. If you are using P2P communication, please check if tensors are put in the correct GPU.

When I use ‘col.allreduce_multigpu’, I get the above error. The same is true when using the ‘col.recv_multigpu’.
Several of my communication processes are created through this function, for example:

def do_allreduce(self, var1):
   ...
   with cupy.cuda.Device(0):
       var2 = cupy.array(var1)
       col.allreduce_multigpu([var2], group_name=group_name)
       cupy.cuda.Device(0).synchronize()
       var3 = concatenated_recv_buffer.get()
   ...

Why did this error occur? And my ray version is 2.1.0 and nccl version is 2.16.2.