I am using Ray data with Ray train. If only one worker, it works fine. But if more than 1, the system will hang there just like dead.
The final messages output look like this :
(SplitCoordinator pid=64863) StreamSplitDataIterator(epoch=1, split=3) blocked waiting on other clients for more than 30s. All clients must read from the DataIterator splits at the same time. This warning will not be printed again for this epoch.
How can I debug this ?
Ray 2.34 version