I ran the script
num_workers=1 got 44%,
However, when I set
num_workers=4, the accuracy dropped drastically that only got less than 10%.
It seems that each process model only used 1/4 data to train and their weights are not gathered.
Have anyone met this issue? Thank you.
Hey @tangcc1127, thanks for pointing this out. It seems like the primary bug in the code was that the total
size being used to calculate
accuracy = correct / size was incorrectly using the global size rather than the worker size.
I made a fix for this here. Let me know if this looks right to you!
It does work! Thank u.
len(dataset) may not divisible by
num_workers so I count the total number of validate data of each worker by myself.