Severity of the issue:
Medium: Significantly affects my productivity but can find a workaround.
I store my wandb offline and then after tuning finished want to upload them.
Very roughly:
results = tuner.fit() # with a Wandb Callback
for result in results:
# upload Path(result.path) / "wandb"
However, this often, but not always fails (~2 / 3 of times). The dict were the wandb data should be (result.path + "/wandb"
) does not exist in these cases.
The files in /tmp/.../wandb
are there as reported by the WandbLogger output
I thought the sync is not yet done and added some time.sleep
which also did not help and even manually checking the dir after some time: it is still empty.
My assumption is that the syncinc process is dead without raising visible errors.
Do you have any ideas what could cause the issue, and how I could debug it?
Sidenote: I see this issue in a test of mine which just runs for two iterations. It appears setting it to longer test does sync the files properly more reliably. So I think there might be a bug involving short tune runs. Other non-wandb files seems to be synced correctly.