How to save models after Trainer.fit()?

Hi !

I have been able to train a model using

trainer = TensorflowTrainer(...)

and call

result = trainer.fit()

everything works fine, now; when the model finished training, I need to actually load the trained model and serialize it using own code (since we have a model registry)

However, I didn’t find anywhere how to do it, especially in Ray’s official documentation. Those examples usually ends after result = trainer.fit()

If I use

result.checkpoint.load_model(), it failed; saying i need to supply model argument in load_model

if I use

checkpoint = result.checkpoint
checkpoint.to_directory("/tmp/test_checkpoint")
self.model = keras.models.load_model("/tmp/test_checkpoint")

If failed also (saying there is no SavedModel in the folder)

We are using Ray 2.6.1. What is the documented way to get the trained model after calling trainer.fit()?

Thanks

Hi @cartml,

How are you creating your checkpoint? In general the recommended way to do this would be something like this:

def train_func():
    with tempfile.TemporaryDirectory() as tmpdir:
        # todo: save your checkpoint to tmpdir in whatever format you want
        # example: model.save(tmpdir + "/test_checkpoint")
        checkpoint = Checkpoint.from_directory(tmpdir)
        session.report({...}, checkpoint=checkpoint)

trainer = TensorTrainer(train_func, ...)
result = trainer.fit()

checkpoint.to_directory(some_local_path)
 # todo: load your checkpoint 
 # example: keras.models.load_model(some_local_path + "/test_checkpoint")

The way that you can think about is that:

  1. Within the training function, you are creating a directory that you can store whatever checkpoint data you want.
  2. When you fetch the checkpoint at the end, you can “recreate” that directory locally.