Nccl errors why using Ray

I have four A6000GPUs and try to utilize Ray parallel the running process but got below errors. I use cupy-cuda117, CUDA version 11.7.

File “profile_communication.py”, line 199, in profile
self.profile_allreduce(1 << i, cp.float32, [list(range(self.world_size))])
File “profile_communication.py”, line 80, in profile_allreduce
comm = self.init_communicator(groups)
File “profile_communication.py”, line 73, in init_communicator
comm = cp.cuda.nccl.NcclCommunicator(
File “cupy_backends/cuda/libs/nccl.pyx”, line 283, in cupy_backends.cuda.libs.nccl.NcclCommunicator.init
File “cupy_backends/cuda/libs/nccl.pyx”, line 129, in cupy_backends.cuda.libs.nccl.check_status
cupy_backends.cuda.libs.nccl.NcclError: NCCL_ERROR_INVALID_USAGE: invalid usage

Sorry, we’re going to need more information here. Can you provide the code that you’re trying to run?

@SusuXu try running it with export NCCL_DEBUG=INFO and/or sharing the source code so we can help.