I ran the script train_fashion_mnist_example.py
with num_workers=1
got 44%,
However, when I set num_workers=4
, the accuracy dropped drastically that only got less than 10%.
It seems that each process model only used 1/4 data to train and their weights are not gathered.
Have anyone met this issue? Thank you.
Hey @tangcc1127, thanks for pointing this out. It seems like the primary bug in the code was that the total size
being used to calculate accuracy = correct / size
was incorrectly using the global size rather than the worker size.
I made a fix for this here. Let me know if this looks right to you!
It does work! Thank u. len(dataset)
may not divisible by num_workers
so I count the total number of validate data of each worker by myself.