Failed to initialize Rabit when running XGBoost on Ray

I kept getting the following err when running XGBoost on Ray. There is no additional err msg. What might be the cause?

/build/build/ext/public/python/xgboost/1/7/3/build/python3.9/src/collective/rabit_communicator.h:47: Failed to initialize Rabit

Rabit seems to be a collective library that XGBoost uses:

You probably need to install it as a dependency.

Rabit will be installed as part of xgboost. The inability to initialize it usually means a networking issue.

Thanks. After this err the whole training failed. Is there a way to increase number of retries to tolerate this err?

I am not sure, but you should be able to use the built-in xgboost-ray fault tolerance to automaticaly restart the training if a failure is encountered