As the title suggests, when using ray tune, the epoch takes roughly 3 hours, where it plainly takes an hour to complete, here’s the flame graph of the 20 seconds runtime and ~6000 records collected.
thanks for the quick reply @rliaw, I’ve edited the post to include the initialization of the trainer and the ray tuning part, I know it would be easier to include a verifiable and complete example, but the model is a bit complex. Do you think we can conclude anything from the above code along with the flame graph and following dump?
Also, although I am decorating trainable with @wandb_mixin, I am using WandbLogger from PyTorch Lightning, because it seems that ray doesn’t expose any custom implementation of WandbLogger so that I’d pass it to the Trainer, so I am only using wandb_mixin to skip the initialization lines.