Hmm, I tried running the following simple example:
from ray.train import Trainer
from ray import train, tune
def train_func():
train.save_checkpoint(metric=123)
trainer = Trainer(backend="torch",num_workers=2)
trainable = trainer.to_tune_trainable(train_func)
tune.run(trainable)
And the checkpoint was written to:
~/tune_function_2022-03-21_22-02-05/tune_function_37f72_00000_0_2022-03-21_22-02-05/checkpoint_000000/checkpoint
The file isn’t expected to be named train_stuff
here.