everything works fine, now; when the model finished training, I need to actually load the trained model and serialize it using own code (since we have a model registry)
However, I didn’t find anywhere how to do it, especially in Ray’s official documentation. Those examples usually ends after result = trainer.fit()
If I use
result.checkpoint.load_model(), it failed; saying i need to supply model argument in load_model
How are you creating your checkpoint? In general the recommended way to do this would be something like this:
def train_func():
with tempfile.TemporaryDirectory() as tmpdir:
# todo: save your checkpoint to tmpdir in whatever format you want
# example: model.save(tmpdir + "/test_checkpoint")
checkpoint = Checkpoint.from_directory(tmpdir)
session.report({...}, checkpoint=checkpoint)
trainer = TensorTrainer(train_func, ...)
result = trainer.fit()
checkpoint.to_directory(some_local_path)
# todo: load your checkpoint
# example: keras.models.load_model(some_local_path + "/test_checkpoint")
The way that you can think about is that:
Within the training function, you are creating a directory that you can store whatever checkpoint data you want.
When you fetch the checkpoint at the end, you can “recreate” that directory locally.