I kept getting the following err when running XGBoost on Ray. There is no additional err msg. What might be the cause?
/build/build/ext/public/python/xgboost/1/7/3/build/python3.9/src/collective/rabit_communicator.h:47: Failed to initialize Rabit
I kept getting the following err when running XGBoost on Ray. There is no additional err msg. What might be the cause?
/build/build/ext/public/python/xgboost/1/7/3/build/python3.9/src/collective/rabit_communicator.h:47: Failed to initialize Rabit
Rabit seems to be a collective library that XGBoost uses:
You probably need to install it as a dependency.
Rabit will be installed as part of xgboost. The inability to initialize it usually means a networking issue.
Thanks. After this err the whole training failed. Is there a way to increase number of retries to tolerate this err?
I am not sure, but you should be able to use the built-in xgboost-ray fault tolerance to automaticaly restart the training if a failure is encountered