Training loop stuck at StreamSplitDataIterator

I found a problem in my code.

My code originally

...
if ray.train.get_context().get_world_rank() == 0:
    ...
    ray.train.report(metrics, checkpoint=checkpoint)
...

Documentation for ray.train.report states clearly that it should be called from all workers.

ray.train.report was not called from non zero rank workers.

Following change fixed my problem.

...
if ray.train.get_context().get_world_rank() == 0:
    ...
    ray.train.report(metrics, checkpoint=checkpoint)
    ...
else:
    ray.train.report(metrics, checkpoint=None)
...